0031//Is three samples good enough? Compute sampling Margin Error
Recently, I involved in few projects require me to use hypothesis testing to check any impact due to recent process change. However, one of my data set only have three samples. This prompt me a question. is three sample good enough? Definately, I don't think so. Then, what is the optimum sample size good enough for me to measure the "state of condition"?
First come to my mind is the confidence level (statistical)! But, how to translate it to the optimum sample size? After I done some revision, I found a way to solve this puzzle. Instead of get the optimum number, I use my current samples compute its error. This error help us to understand our sample size good enough or not.
Sampling size
Consider I want to know the p, yield of my pin locate inside the tolerance zone (positional control), we sampling check the N pieces of pins in our inventory. Due to resources constrains, we cannot perform measurement for all pins in our inventory.
From the sampling size, I found the yield of the pin is p_1. In the second trial, I sampling check another N pieces of pins and found that the yield of pin is p_2. So, I repeated M times of trial, I found my yield for the pins are p_1, p_2, p_3, ... ... , p_M-1, p_M.
All these yield p_i are different and we will never know the entire population yield q is how much. So, the different between p_i and q known as sampling error. The sampling error = 0 when N approach N_ep
Hypothesis, Type I and Type II Errors
When we want to know the state of conditions, we usually will state our hypothesis first. In this case, hypothesis defined as below
However, due to error between p and q, we always have the chance that false reject or false accept the hypothesis. Below shown the possible outcome of hypothesis testing
Type I error refer to false positve, (aka Error of the 1st kind). In this situation, hypothesis testing reject a true H_0. Type II error refer to false negative (aka Error of the 2nd kind). In this situation, hypothesis testing accept a false H_0.
Type I and type II error affect by significant level and false negative rate. The complement of significant level and false negative rate are confidence level and power (or sensitive) of testing (informal defination).
In order to minimize the error, we tend to minimize significant level and false negative rate. However, when overminimize of false negative rate and sampling size approach entire population size, true H_0 may reject due to higher of testing power tend to be conservative (when accept H_0).
Standard Error of the Proportion
To avoid over reject and over accept, optimal sampling size is required. So, the error of the sampling yield p can help us to gauge how large sample size needed to describe the entire population.
Margin Error, radius of confidence interval
Probability density distribution (not in scale) shown how the samping size, N (not in scale with above example) affect the spread of the p's distribution. Then, this spread can quantify by margin error. Margin error is the function of significant level and degree of freedom (where degree of freedom, df = N-1).The larger the sample size, N, the smaller the margin error.
The lines shown at the second part known as the length of confidence interval. In the diagram, it set the confidence level as 95% (aka 0.05 significant level). By this illustration, you can notice that margin error is the half of confidence interval's length.
Conclusion
Given sample size, significant level (complement of confidence level) and variance/sampling proportion, you can get the length of confidence interval length (aka 2 x margin error). By using this technique, you can gauge the reliability of your sampling's data quality and reliability. However, this technique tend to use maximize sampling size, N to minimize the margin error. For optimum sampling size, N_optimum, (without N=N_ep) we will discuss it in the our future post.
For detail calculation for confidence interval, please refer to link below:
http://gbwonggbwong.wix.com/dimensionalmetrology#!0030Statistical-Confidence-Interval-Mean-and-Variance-of-Inspection/c2211/56d8325c0cf249e9dfd09835
return 0;