When Excel isn’t enough:
Charting your data
Why does one need a software program to generate charts, when spread sheet software itself can produce a variety of chart formats?
Indeed, this is a question that quality professionals explore frequently, summoning appropriate rationale to convince purchasing agents or supervisors that the expenditure is justified by the results.
One can, of course, develop charts from spreadsheet programs. One can also, as we all know from experience, use whatever is handy to pound a nail into a wall to hang a picture. (You mean you’ve never grabbed your stapler, or a paperweight, to facilitate a simple task such as this?)
Just as in the case of the stapler-as-hammer, using a spreadsheet to create charts that can provide information with customized charts, graphs, and diagrams, clearly works, but the question is—how well? Any master carpenter will emphasize the importance of using the right tool for a job, and the charting job is no exception.
Many customers have Excel data in a form similar to the following set, submitted as a query by a customer through the PQ Systems Knowledge Base.
Discharge Date |
LOS |
1/1/2003 |
3 |
1/2/2003 |
2 |
1/3/2003 |
4 |
1/4/2003 |
2 |
1/5/2003 |
1 |
etc. |
|
Imagine that you have data like this covering an entire year or even more. You might be interested to know how the average length of stay is changing throughout the year. There are several ways to go about this using CHARTrunner.
One approach is to define a control chart which treats the LOS column as either a measurement (individuals chart) or a count (c-chart). The problem with this approach is that you may have several thousand rows of data. Each row will create a data point on the chart. Although you will be able to see the average LOS, the chart could be quite congested and it may not show you how LOS is changing over time.
Another approach is to decide on some time interval for grouping the data. For example, with this data you might look at the average LOS by week or by month. To do this you could use the Advanced row selection feature. This is found on the data definition tab of the chart definition form. This allows you to do grouping based on the DischargeDate column and select the grouping units - such as week, month, etc. This is a powerful feature but it does have limitations. For example, if you do advanced row grouping by month, and one of your months contains more than 100 entries the chart will not display. This is due to a subgroup size maximum of 100 for advanced row grouping.
Perhaps the best way to solve this problem is to use a custom SQL query and allow the query itself to do some aggregating of the data. This will be easier if you add a new column to your Excel sheet so that it looks like this:
DischargeDate |
LOS |
YearMonth |
1/1/2003 |
3 |
2003 01 |
1/2/2003 |
2 |
2003 01 |
1/3/2003 |
4 |
2003 01 |
1/4/2003 |
2 |
2003 01 |
1/5/2003 |
1 |
2003 01 |
The YearMonth column can contain an Excel formula based on the DischargeDate column. Here is the formula used in the example:
=YEAR(A2) &" " & TEXT(MONTH(A2),"00")
Next, we will create a new CHARTrunner chart that uses this Excel sheet as the data source. On the data definition tab, select Custom query as the source for data records - then click on the Edit query button and type in the following custom SQL query:
SELECT Min(YearMonth) as MinYrMonth, format(avg(los),"0.00") as AvgLos FROM [Sheet1$] GROUP BY YearMonth ORDER BY YearMonth
Two things about this query will help us: 1) we are using the GROUP BY clause to aggregate the data by month and 2) we are using the avg() function to calculate an average length of stay for each month. When this query is run the result data will look like this:
MinYrMonth |
AvgLos |
2003 01 |
2.89 |
2003 02 |
3.07 |
2003 03 |
3.18 |
2003 04 |
3.11 |
2003 05 |
3.12 |
2003 06 |
2.94 |
2003 07 |
3.35 |
2003 08 |
3.18 |
2003 09 |
3.19 |
2003 10 |
3.30 |
2003 11 |
3.09 |
2003 12 |
3.09 |
In CHARTrunner you would treat the MinYrMonth as an identifier and you would treat the AvgLos as a measurement. The resulting chart will contain 12 data points; one for each month found in the data.
Now that you know how the average LOS is changing from month to month - you may also want to know how many data points were used to compute each monthly average. A slight change to the query will give us this information as well. Here is the SQL statement:
SELECT Min(YearMonth) as MinYrMonth, format(avg(los),"0.00") as AvgLos , Count(YearMonth) as NumCases FROM [Sheet1$] GROUP BY YearMonth ORDER BY YearMonth
When this query is run the result data will look like this:
MinYrMonth |
AvgLos |
NumCases |
2003 01 |
2.89 |
37 |
2003 02 |
3.07 |
30 |
2003 03 |
3.18 |
38 |
2003 04 |
3.11 |
35 |
2003 05 |
3.12 |
33 |
2003 06 |
2.94 |
33 |
2003 07 |
3.35 |
34 |
2003 08 |
3.18 |
33 |
2003 09 |
3.19 |
31 |
2003 10 |
3.30 |
33 |
2003 11 |
3.09 |
32 |
2003 12 |
3.09 |
33 |
If you treat the NumCases column as an identifier this information may be displayed above the chart in the identifier grid. This way you will know both the average LOS and the number of cases considered.
Copyright
2008 PQ Systems.
Please direct questions or problems regarding this web site to the Webmaster. |