Inference for Means, Part 1

Construct an Estimate

The Idea

We now know something about the distribution of x_—what remains is to see how we use it to get at the (supposedly unknown) parameter μX.

From our knowledge of the distribution of x_, we can calculate how often certain values of x_ occur. From our knowledge of the Empirical Rule, we can even say things like 95% of x_ values lie within the interval  μx_ 2 σx_ μx_ + 2 σx_ .

Now comes the tricky part. If x_ lies within two σx_ of μx_, then μx_ lies within two σx_ of x_, right? You can measure the distance either way—from the center to x_, or from x_ to the center.

95% of x_ values lie within the interval  μx_ 2 σx_ μx_ + 2 σx_ . So 95% of x_ values will have μx_ in the interval  x_ 2 σx_ μx_ + 2 σx_ . Notice that I cannot say that 95% of μx_ values are in the interval—there is only one value for μx_. The 95% refers to x_ …for 95% of the values of x_, the interval  x_ 2 σx_ μx_ + 2 σx_  around x_ will contain μx_.

So, if I go out and take a sample, there’s a pretty good chance (about 95%) that the x_ I get will be within two σx_ of μx_ —so there’s a good chance (about 95%) that μx_ will be contained in the interval  x_ 2 σx_ μx_ + 2 σx_ .

Be careful, though—you don’t want to suggest that 95% of x_ values are in this interval, or (even worse) that 95% of μx_ values are in the interval. Also, you don’t want to suggest that there’s a 95% chance that μx_ is located at some place. μx_ is fixed; it doesn’t move, and cannot have any sort of probability attached to it. The 95% refers to x_ (or, by extension, the interval).

So what, you say? Who cares about finding out where μx_ is located? Well, remember that if x_ is unbiased, then  μx_ = μx . So an interval for μx_ is the same as an interval for  μX .

A Confidence Interval for the Population Mean

The Formula

To generalize, we replace the 2 with what it really is—the value of Z that has a right tail area of  1C2 , where C is the percentage (of confidence).

A level C confidence interval for  μX  is  x_ ± z* σx n , where z* is the upper  1C2  critical value from N(0,1).

The Requirements

In order for this to work, we need a random sample, and the distribution of x_ must be approximately normal (with a known  σX  ).

An Example

[1.] In 1997, the price of history books had a standard deviation of $7.61. A random sample of 40 history books has a mean of $46.93. Let’s construct a 99% confidence interval for the true mean price of a history book (In 1997).

First of all, we’d better check the requirements: we need a random sample, and the distribution of x_ must be approximately normal. We’re told that the sample was random. Since the sample size is large (40), the CLT tells us that the distribution of x_ will be approximately normal. So, we are justified in using these procedures to construct an interval.

x_ = 46.93,  σX  = 7.61, and n = 40. C = 0.99 means that z* will be the Z value with area  10.992  to the right. A quick check of the normal table (or a few keystrokes) finds the value of z* is 2.5758.

So the interval is  x_ ± z* σx n  =  46.93 ± 2.5758 7.61 40  = 46.93 ± 3.0993 = (43.8306, 50.0294).

I am 99% confident that the true mean price of a history book (in 1997) is between $43.83 and $50.03.


Test a Hypothesis

The Idea

If you’re not trying to estimate  μX , then the other type of inference you can conduct is to test a guess about the value of  μX . This begins with a guess (the hypothesis), which leads to a probability calculation (how unusual are my results, if my hypothesis is correct?), and finally, a conclusion.


OK, so it actually begins with a pair of hypotheses. The first hypothesis—the Null Hypothesis—is your guess about the value of μX. This hypothesis should always reflect the idea that nothing has happened; nothing has changed; there is no difference; etc. A null hypothesis might look like this: H0 μX  = 100.

In addition, you must state an Alternative Hypothesis. This typically states the kind of evidence that you are looking for—evidence that the real mean is higher, or lower (or simply different) than the value from the null hypothesis. An alternative hypothesis might look like this: Ha μX  > 100 (if you’re looking for evidence that the population mean is greater than 100).

Note that some people use H1 for the alternative. It’s the same thing.

In addition to the statement of the value, you should provide a verbal description of the hypotheses. See the example later for this.

Type of Test

Your choice of test depends primarily (but not exclusively) on the parameter that appears in the hypotheses. For now, there is only one—the Z test for Means. More on this later.


Most tests depend on knowing some things about the sampling distribution of the statistic. For the Z Test for Means, the things you need are:

[1] A random sample;

[2] A normal distribution in the population (or at least in the distribution of x_ ).

Be sure to state what the requirements are, and then check them. See the example later for more on this.


At this point in the test, you haven’t collected any data (or in AP Stats, haven’t looked at any). Before you go and measure a sample, you have to decide how much evidence it’s going to take to make you believe that your null hypothesis is wrong. Will the slightest deviation from your assumption make you believe that the assumption is wrong? Will it take a mountain of evidence to convince you that the assumption is wrong? The amount of evidence that you think you need is called the Level of Significance. It is expressed as a percentage (or proportion, or probability). The lower the level of significance (denoted α), the more evidence it’s going to take to convince you that the null hypothesis is wrong.

You don’t yet have any idea of how to set a value for α; don’t worry. For now, just set it at 0.05 (0.20 is quite high; 0.01 is very low). More on this later.


At this point, you would go out and collect the data. Of course, we’ll just be taking the data that are given to us. The next step is to calculate how unusual your sample results are.

In particular, you now have a value of x_, and it’s probably not equal to your assumed value of μX . We expect this; variation is inevitable. But how unusual is this value of x_ ? How often should you get a value of x_ that far (or farther) from the assumed value of  μX  ?

We learned how to calculate probabilities for x_ in a previous chapter—and that’s exactly what we’re going to do now. The result of this is called the p-value (or attained significance).

Greater Than / Less Than

If the alternate hypothesis uses < or >, then that symbol indicates the direction in which the area should be found.

For example, if your hypotheses are H0 μX  = 100 vs. Ha μX  > 100, and you get a value of x_ of 101.5, then the probability you must calculate is P( x_ > 101.5).

Note that the actual value of x_ doesn’t determine the direction—if x_ = 99.2, you’d still find area to the right: P(x_ > 99.2).

Not Equal To

If the alternate uses ≠, then you have to do something different. In particular, you want to find the area in both directions—you want to find the probability of getting an x_ value this far (or farther) from  μX  in either direction (higher or lower). Because of symmetry, you can simply find the area of one tail (away from the center), and double that value.

For example, if your hypotheses are H0 μX  = 100 vs. Ha μX  ≠ 100, and your value of x_ is 102.8, then the probability that you must calculate is P(x_ < 97.2) + P(x_ > 102.8) = 2P(x_ > 102.8).

If your value of x_ is 90.9, then you will calculate P(x_ < 90.9) + P(x_ > 109.1) = 2P(x_ < 90.9).

The Rejection Region Method

Back in the days before calculators, tests were conducted with rejection regions—values of the test statistic (in this case, Z) that would lead to rejection of the level of significance. This is beginning to fall by the wayside, but you may hear it mentioned somewhere (perhaps you may even read it in your textbook!).

To find a rejection region, work backwards from the level of significance (α) to a value of the test statistic (call it z*). Values of the test statistic that are farther from the hypothesized value of μX (i.e., values of z where z* is between μX and z) will lead to rejection of the null hypothesis. We will be doing a lot of this in the final section of this chapter.


Now that you have a measure of how unusual your sample was, it’s time to decide whether or not your assumption seems reasonable—whether or not you’ve got enough evidence to reject the null hypothesis (also recall that low p-values mean you’ve got more evidence).

So, compare the p-value to the significance level. If the p-value is lower, then you’ve enough evidence (maybe more than you need) to conclude that the assumed value of μX is incorrect. If the p-value is higher, then you don’t have enough evidence to conclude that the assumed value of μX is incorrect.

If you have enough evidence—if you reject the null hypothesis—then you should state that the alternate hypothesis is probably correct. If you don’t have enough evidence—if you fail to reject the null hypothesis—then you should state that the null hypothesis is probably correct. In either case, you need to say this in the context of the problem.

For example, if you are testing whether kids in your area are smarter than average (higher IQ), your hypotheses are H0 μX = 100 (kids in this area are average) vs. HaμX > 100 (kids in this area are smarter than average). Suppose you choose α = 0.05, but the p-value is 0.01. This is enough evidence to reject the null hypothesis; there is evidence that kids in your area are smarter than average.

Ultimately, what we are doing here is determining if our sample results are likely to have arisen by chance alone. If something is unlikely to have arisen by chance alone, then it is significant.

A Test of Significance for the Population Mean

[1] Hypotheses. H0 μX = μ0. The "sub zero" indicates that this isn’t a variable; it’s a number that we don’t know yet (there is a difference!). Ha μX ? μ0, where ? can be <, >, or ≠.

[2] Identify the Test (for now, Z-Test for Means).

[3] Requirements (State and Check): Random Sample; Normal Population (or at least a normal distribution of x_ ).

[4] Choose/State Level of Significance

[5] Mechanics (show your calculations of the p-value).

[6] Conclusion.

An Example

[2.] European Officials have set a limit on the amount of cadmium (a heavy metal, which is toxic) that is permitted in mushrooms sold in the region—in particular, mushrooms with cadmium concentrations greater than 0.5ppm (parts per million) are not allowed. A mushroom farmer is wondering if he can sell his crop. He tests a random sample of mushrooms and measures the amount of cadmium—the mean of his 12 observations is 0.5258. Assuming that cadmium levels vary normally with standard deviation 0.37ppm, does the farmer have evidence (at the 5% level) that his mushrooms are safe to eat?

Let X represent the cadmium concentration in a single mushroom, and x_ represent the mean concentration in 12 mushrooms.

H0 μX = 0.5 (the mushrooms are safe); HaμX > 0.5 (the mushrooms are unsafe).

This calls for a Z-Test for Means.

This test requires a random sample and a normal distribution in the population. We are told that the sample is random, and we are told that the population has a normal distribution. We may continue.

We are told to test this at the 5% level.

The p-value is P(x_ > 0.5258). X has a normal distribution, so x_ does, too. Thus, we may standardize. z = 0.5258 0.5 0.37 12  ≈ 0.2416. So P(x_ > 0.5258) ≈ P(Z > 0.2416) ≈ 0.4046.

If the mean cadmium concentration is 0.5, then we can expect to get sample mean concentration of 0.5258 or higher in 40.46% of samples. This happens often enough to attribute to chance at the 5% level; it is not significant, and we fail to reject the null hypothesis.

It appears that these mushrooms are safe to eat.


Errors in Tests of Significance

When you conduct a test of significance, you’re making an assumption, and then deciding whether or not that assumption appears to be true. Alas, we never know if our assumption is true or not—thus, we might be making an error when we reject/fail to reject the null hypothesis.

Type I Errors

A Type I Error occurs when we reject a true hypothesis—in other words, we reject the null hypothesis, but the hypothesis is really (unbeknownst to us) true.

The probability of a Type I Error is easy to calculate—if you’ve decided that your level of significance is 5%, then you will reject the Null Hypothesis if the p-value of the sample mean is less than 5%. How often should you get sample means with p-values less than 5%? Why, in 5% of samples, of course! So the probability of a Type I Error is the same as the level of significance—α.

Type II Errors

A Type II Error occurs when we fail to reject a false hypothesis—in other words, we fail to reject the null hypothesis, when the value of the parameter (μX) is really some other value (not the one we assumed).

The probability of a Type II Error is more difficult to calculate—it depends on the actual value of μX, which could be anything. Thus, we must calculate the probability of a Type II Error against a specific alternative, μa . To do this, we must first convert our level of significance into an actual value of x_. Then, we must find the area to one side of that x_ (the side depends on what kinds of x_ values would lead to rejection). The symbol for the probability of a Type II Error is β.

For Example: Let’s assume that our alternate hypothesis is >.

Step 1: Convert α… An Image

… to x_. An Image

Step 2: Assuming that the parameter actually has value μa, find the probability of failing to reject.

An Image

The Power of a Test

The Power of a test is the probability that you reject a false hypothesis—in other words, you reject the null hypothesis when the null is false (the population mean is actually some other value). Since this is calculated when the null hypothesis is false, you need an alternate (μa) to measure against. In fact, Power is the complement of β.


The manufacturer of a new car claims that it gets 26 miles per gallon (mpg). A consumer group is skeptical of this claim and plans to test H0μX  = 26 vs. HaμX < 26 at the 5% level of significance on a sample of 30 cars. Assume that the standard deviation of this kind of measurement is 1.4mpg.

[3.] What is the probability of a Type I Error?

The probability of a Type I Error is the same as the level of significance—0.05.


[4.] What is the probability of a Type II Error vs. μa = 25.8?

This will be a bit more complicated. First, I need to convert α into a value of x_. Since the alternate hypothesis uses <, I’m going to reject the null hypothesis only if x_ is too low—below, or to the right, of 26. I need to find that value of x_ that has a left-hand area (away from 26) of 0.05.

An Image

Of course, we did that before. Do an Inverse Normal calculation to find the x_ value is 25.5796.

An Image

Now we need to find out how often we’ll fail to reject (how often x_ will be greater than 25.5796) if μX = μa = 25.8.

An Image

So, standardize 25.5796!  z = 25.579625.8 1.430  = -0.8624. P(x_ > 25.5796) = P(Z > -0.8624) = 0.8058. There is an 80.58% chance of a Type II Error.


[5.] What is the power of the test vs.  μa = 25.8?

The power of the test vs. μa = 25.8 is 1 - β = 1 - 0.8058 = 0.1942.

Page last updated 2012-04-01