# How can Cpk be good with data outside the specifications?

A customer who called our application support line recently could not understand why his Cpk, calculated by SQCpack, was above 1.0 when his data was not centered between the specifications and some of the data was outside the specification. How can you have a good Cpk when you have data outside the specification and/or data which is not centered on the target/nominal value?

To calculate Cpk, you need to know only three pieces of information: the process average, the variation in the process, and the specification(s). First, find out if the mean (average) is closest to the upper or lower specification. If the process is centered, then either Zupper or Zlower can be used, as you will see below. If you only have one specification, then the mean will be closest to that specification since the other one does not exist.

To measure the variation in the process, use the estimated sigma (standard deviation). If you decide to use the standard deviation from the individual data, you should use the Ppk calculation, since Ppk uses this sigma. To calculate the estimated sigma, divide the average range, R-bar, by d2. The d2 value to use depends on the subgroup size and will come from a table of constants shown below. If your subgroup size is one, you will use the average moving range, MR-bar.

**d _{2} values**

Subgroup size |
d_{2} |

1 |
1.128 |

2 |
1.128 |

3 |
1.693 |

4 |
2.059 |

5 |
2.326 |

6 |
2.534 |

7 |
2.704 |

You, of course, provide the specifications. Now that you have these 3 pieces of information, the Cpk can be easily calculated. For example, let’s say your process average is closer to the upper specification. Then Cpk is calculated by the following:

Cpk = (USL - Mean) /( 3*Est. sigma). As you can see, the data is not *directly* used. The data is only indirectly used. It is used to determine mean and average range, but the raw data is not used in the Cpk calculation. Here is an example that might serve to clarify. Suppose you have the following example of 14 subgroups with a subgroup size of 2

Sample No. |
Average |
Range |
||

1 |
0.03 |
0.06 |
0.045 |
0.030 |

2 |
0.10 |
0.20 |
0.150 |
0.100 |

3 |
0.05 |
0.10 |
0.075 |
0.050 |

4 |
1.00 |
0.00 |
0.500 |
1.000 |

5 |
1.50 |
1.50 |
1.500 |
0.000 |

6 |
1.10 |
1.50 |
1.300 |
0.400 |

7 |
1.10 |
1.00 |
1.050 |
0.100 |

8 |
1.10 |
1.01 |
1.055 |
0.090 |

9 |
1.25 |
1.20 |
1.225 |
0.050 |

10 |
1.00 |
0.30 |
0.650 |
0.700 |

11 |
0.75 |
0.76 |
0.755 |
0.010 |

12 |
0.75 |
0.50 |
0.625 |
0.250 |

13 |
1.00 |
1.10 |
1.050 |
0.100 |

14 |
1.20 |
1.40 |
1.300 |
0.200 |

Average |

The mean, X-bar, is 0.8057 and the average range, R-bar, is 0.220. For this example, the upper specification is 2.12, the target value is 1.12, and the lower specification is 0.12. In the data shown above, more than 21% of the data is outside the specification, so you would expect Cpk to be low, right? As it turns out, Cpk is relatively healthy at 1.17. (Yes, for this example, we have ignored the first cardinal rule: Before one looks at Cpk, the process must be in control.)

Before we go on, let’s check the math.

Mean = 0.8057

Average range = 0.2200

Est. sigma = R-bar / d2

Cpk = smallest of (Zupper and Zlower) / 3= 0.2200/1.128 =0.1950

Zlower = (Mean - LSL) / Est. sigma

= (.8057 - 0.12) / .1950

= .6857 / .1950

Zlower = 3.516

Zupper is larger, so in this example,

Cpk = Zlower / 3

= 3.516 / 3

Cpk = 1.172

So what gives? Here is an example where Cpk is good, yet the process is not centered and data is outside of at least one of the specifications. The reason Cpk is good is because the average range is understated and thus when you divide by the estimated sigma (which uses the average range), it over-inflates Cpk. The reason the average range is understated will be discussed in a future article. One last note, if you look at this data on a control chart, you will quickly see that it is not in control. Therefore, the Cpk statistic should be ignored when the process is not in control.