]>
We have a question…a question that has an answer (a number). Unfortunately, we can't measure all of the individuals in the population in order to answer the question…so we take a sample (which we hope is representative of the population). From that sample, we can get a point estimate of our answer—a single number which may or may not actually be the answer.
Alas, point estimates are usually wrong. It would be better to give an interval—it would be better to be able to say I'm pretty sure the answer lies within this interval.
To do this, we have to use the sampling distribution (from § 8.2) and attack the interval backwards…
Let's consider the case of means—in other words, the answer to our big question could be answered by finding the mean of a big data set. We can get a point estimate of this big mean by taking a sample and finding .
Now, from our knowledge of the distribution of (§ 8.2), we can calculate how often certain values of occur. Specifically, from our knowledge of the Empirical Rule (§ 7.1), we can say things like 95% of values lie within the interval .
Now comes the tricky part. If lies within two of , then lies within two of , right? You can measure the distance either way—from the center to , or from to the center.
95% of values lie within the interval . So 95% of values will have in the interval .
So, if I go out and take a sample, there's a pretty good chance (about 95%) that the I get will be within two of —so there's a good chance (about 95%) that will be contained in the interval .
The 2 in those calculations comes from the Empirical Rule, and produces 95% confidence (95% of the intervals we produce ought to surround, capture or otherwise contain the population mean μ). What if we want more confidence (or even less)? That value is the key, and it is called the Critical Value.
A Level C Critical Value is a value of z so that the area between -z and z is C (remember those problems from § 7.3?). The book denotes this critical value as .
Examples:
[1.] Find the 90% critical value.
In other words, find a value of z so that the area between -z and z is 0.9. A picture, perhaps?
That means that the un-shaded area is 0.1; because of symmetry, the left un-shaded tail has area 0.05. So the area to the left of -z is 0.05. Remember how to find the value of z if you have the left-hand area?
InvNorm! That gives . So (note that is always positive).
[2.] Find .
That will make the left un-shaded area 0.005 (draw the picture if you need to). Run invNorm and find that .
Notice, in the work (far) above, that the critical value is multiplied by the standard deviation of the sampling distribution. This is called the Margin of Error. If you've ever heard a newscaster refer to the margin of error in a poll…this is what they're talking about!
Well, put it all together! A Level C Confidence Interval for the Population Mean is
where is the Level C Critical Value.
This really requires that you know σ (not a large samples). Knowing σ is, in reality, is a pretty unrealistic thing. However, when the sample size is fairly large (say, 30 or more), , and you can replace σ with s without any problem. This is the reason why the title of this section includes the phrase with large samples.
For most problems, you are going to have to calculate and s on your own…we did that back in Chapter 3; you haven't forgotten it, have you?
Examples:
[3.] Construct a 95% confidence interval for a sample of size 50 with mean 104.2 and standard deviation 20.
The only part of the formula that isn't given in this statement is …it happens to be approximately 1.96. So the interval is . I am 95% confident that the true mean μ is between 98.656 and 109.744.
[4.] An 1846 study measured the chest sizes (girth; in inches) of 5738 Scottish Militiamen. The mean measurement was 39.83 inches, with standard deviation 0.25 inches. Construct a 99% confidence interval for the true mean girth of a late 19th century Scottish Militiaman.
Well—we've already determined that , so all we need to do is plug in! . I am 99% confident that the true mean chest girth is between 39.82 inches and 39.84 inches.
[5.] A researcher is trying to determine the mean size of a family in his state. He selects a random sample of 40 families, and finds the following family sizes:
2 |
6 |
2 |
2 |
1 |
7 |
3 |
1 |
1 |
1 |
5 |
5 |
1 |
2 |
3 |
1 |
4 |
4 |
4 |
2 |
2 |
2 |
3 |
3 |
2 |
2 |
2 |
3 |
4 |
3 |
1 |
5 |
3 |
3 |
5 |
2 |
2 |
3 |
2 |
4 |
Estimate the true mean size of a family in this state with 94% confidence.
This time, we must begin by finding and s: and . Next, find : 1.88. So, the formula: . I am 94% confident that the true mean size of a family in this state is between 2.384 people and 3.266 people.
It was mentioned that using s instead of σ is an okay thing, as long as the sample is relatively large…what if the sample isn't large? This very problem perturbed a beer brewer back around 1900, and he tackled the problem head-on. The result was a new distribution, based on the normal distribution, but where σ need not be known.
It is called the t distribution. Like the standard normal, it is centered at zero, and it is symmetric. However, since s is typically larger than σ, the spread of the t distribution is larger than that of the standard normal…and to make things even more interesting, since s gets closer to σ as the sample size increases, the shape of t depends on the sample size!
In the image above, the blue line is the standard normal curve. The red line is a t curve with a sample size of 2. The green curve is a t curve with a sample size of 11 (notice how close it is to the normal, even for this small sample size).
Unfortunately (for some technical reasons), the type of t curve is determined by something called degrees of freedom. For us, degrees of freedom will always be one less than the sample size: .
This requires a table (unless you've got a new TI 84). The table you need is on page A10 of your text (the pages with the green stripe down the side). First, find the degrees of freedom along the left edge. Next, find the confidence level (the very top row in your text; the very bottom row on the table I will give you). At the intersection of the two is the critical value of t, which the book denotes .
Examples:
[6.] Find the 95% critical value for 10 degrees of freedom.
Look it up! Go down the side to find 10, and go across the top (or bottom; depends on which chart you're looking at ) to find 95%. The answer is .
[7.] Find if .
First, note that ! Now, look it up: .
This is very much like the last interval…just replace the things that are new! A Level C Confidence Interval for the Population Mean is
where is the Level C Critical Value from a t distribution with .
Examples:
[8.] Construct a 95% confidence interval for a sample of size 10 with mean 67.9 and standard deviation 5.48.
Note that ; .
The interval is . I am 95% confident that the true mean is between 63.98 and 71.82.
[9.] A study of the strength of males and females had the following results: the mean strength of 13 males was 2127 Newtons, with standard deviation 512.9875. Construct a 99% confidence interval for the true mean strength of males.
Note that . .
The interval is . I am 99% confident that the true mean strength of males is between 1692.409 Newtons and 2561.591 Newtons.
[10.] A sample of 13 batches of Portland cement were measured for the heat emitted (calories per gram). Here are the data:
78.5 |
74.3 |
104.3 |
87.6 |
95.9 |
109.2 |
102.7 |
72.5 |
93.1 |
115.9 |
83.8 |
113.3 |
109.4 |
|
Construct a 90% confidence interval for the true mean amount of heat emitted.
Note that . Use your calculator to find and . . The interval is . I am 90% confident that the true mean heat emitted is between 87.9867 calories per gram and 102.8595 calories per gram.
We have a question…a question that has an answer. If the variable that you measure is quantitative, then the answer must be a number. If the variable that you measure is qualitative, then you have to make the answer a number—by finding the proportion of individuals that have one of the qualities that you are measuring.
Counting the number of individuals that have a quality is really a binomial problem…but the answer to our question is the proportion of individuals that have that quality. Thus, we must change numbers (counts) to proportions. In general, we do this by dividing by the number of things (the sample size, n). For the work we are about to do, we must change both the mean and standard deviation of the binomial into proportions.
The mean of a binomial variable is . Dividing this by the sample size leaves p. This is the parameter that answers our question—but if we know p, why are we doing all of this?
The answer is that we won't know p (anymore). So, we must estimate it from the sample. The number of successes in the sample (for a binomial) is usually labeled x. To make this into a proportion, divide by the sample size, n. This is the sample proportion: . This is a point estimate of p.
The standard deviation of a binomial is . Dividing by the sample size gives us . However, this requires that we know p—and that's just not going to happen. So, we use the next best thing (just like we did in the earlier sections)—a point estimate. This gives the standard error: .
In the previous section, replacing σ with s caused us to shift to a new distribution, because s is typically larger than σ. However, that doesn't happen here— is not typically larger (or smaller) than p…thus, we can just use a normal distribution.
A Level C Confidence Interval for the Population Proportion is
where is the Level C Critical Value.
Examples:
[11.] Construct a 99% confidence interval if the sample proportion is 0.2 in a sample of size 75.
. The interval is .
I am 99% confident that the true proportion is between 8.1% and 31.9%.
[12.] A Harris Poll from June 2000 reported that 79% of U.S. citizens (based on a random sample of 2000 people) thought that elected officials should be subjected to random drug tests. Construct a 90% confidence interval for the true proportion of citizens that agree with this idea.
90% confidence gives . The interval is .
I am 90% confident that the true proportion of U.S. citizens that agree with this statement is between 77.5% and 80.5%.
[13.] A casino operator is checking to see if the Roulette wheel is working correctly. A sample of 200 spins reveals that 93 of them resulted in red. Construct a 95% confidence interval for the true proportion of spins that will land on red.
95% confidence makes . The interval is .
I am 95% confident that the true proportion of spins that will land on red is between 39.59% and 53.41%.
Page last validated 2010-08-15