# Inference for Means, Part 1

## Construct an Estimate

### The Idea

We now know something about the distribution of $\stackrel{_}{x}$—what remains is to see how we use it to get at the (supposedly unknown) parameter ${\mu }_{X}$.

From our knowledge of the distribution of $\stackrel{_}{x}$, we can calculate how often certain values of $\stackrel{_}{x}$ occur. From our knowledge of the Empirical Rule, we can even say things like 95% of $\stackrel{_}{x}$ values lie within the interval $\left[{\mu }_{\stackrel{_}{x}}-2{\sigma }_{\stackrel{_}{x}},{\mu }_{\stackrel{_}{x}}+2{\sigma }_{\stackrel{_}{x}}\right]$.

Now comes the tricky part. If $\stackrel{_}{x}$ lies within two ${\sigma }_{\stackrel{_}{x}}$ of ${\mu }_{\stackrel{_}{x}}$, then ${\mu }_{\stackrel{_}{x}}$ lies within two ${\sigma }_{\stackrel{_}{x}}$ of $\stackrel{_}{x}$, right? You can measure the distance either way—from the center to $\stackrel{_}{x}$, or from $\stackrel{_}{x}$ to the center.

95% of $\stackrel{_}{x}$ values lie within the interval $\left[{\mu }_{\stackrel{_}{x}}-2{\sigma }_{\stackrel{_}{x}},{\mu }_{\stackrel{_}{x}}+2{\sigma }_{\stackrel{_}{x}}\right]$. So 95% of $\stackrel{_}{x}$ values will have ${\mu }_{\stackrel{_}{x}}$ in the interval $\left[\stackrel{_}{x}-2{\sigma }_{\stackrel{_}{x}},{\mu }_{\stackrel{_}{x}}+2{\sigma }_{\stackrel{_}{x}}\right]$. Notice that I cannot say that 95% of ${\mu }_{\stackrel{_}{x}}$ values are in the interval—there is only one value for ${\mu }_{\stackrel{_}{x}}$. The 95% refers to $\stackrel{_}{x}$ …for 95% of the values of $\stackrel{_}{x}$, the interval $\left[\stackrel{_}{x}-2{\sigma }_{\stackrel{_}{x}},{\mu }_{\stackrel{_}{x}}+2{\sigma }_{\stackrel{_}{x}}\right]$ around $\stackrel{_}{x}$ will contain ${\mu }_{\stackrel{_}{x}}$.

So, if I go out and take a sample, there’s a pretty good chance (about 95%) that the $\stackrel{_}{x}$ I get will be within two ${\sigma }_{\stackrel{_}{x}}$ of ${\mu }_{\stackrel{_}{x}}$ —so there’s a good chance (about 95%) that ${\mu }_{\stackrel{_}{x}}$ will be contained in the interval $\left[\stackrel{_}{x}-2{\sigma }_{\stackrel{_}{x}},{\mu }_{\stackrel{_}{x}}+2{\sigma }_{\stackrel{_}{x}}\right]$.

Be careful, though—you don’t want to suggest that 95% of $\stackrel{_}{x}$ values are in this interval, or (even worse) that 95% of ${\mu }_{\stackrel{_}{x}}$ values are in the interval. Also, you don’t want to suggest that there’s a 95% chance that ${\mu }_{\stackrel{_}{x}}$ is located at some place. ${\mu }_{\stackrel{_}{x}}$ is fixed; it doesn’t move, and cannot have any sort of probability attached to it. The 95% refers to $\stackrel{_}{x}$ (or, by extension, the interval).

So what, you say? Who cares about finding out where ${\mu }_{\stackrel{_}{x}}$ is located? Well, remember that if $\stackrel{_}{x}$ is unbiased, then ${\mu }_{\stackrel{_}{x}}={\mu }_{x}$. So an interval for ${\mu }_{\stackrel{_}{x}}$ is the same as an interval for ${\mu }_{X}$.

### A Confidence Interval for the Population Mean

#### The Formula

To generalize, we replace the 2 with what it really is—the value of Z that has a right tail area of $\frac{1-C}{2}$, where C is the percentage (of confidence).

A level C confidence interval for ${\mu }_{X}$ is $\stackrel{_}{x}±{z}^{*}\frac{{\sigma }_{x}}{\sqrt{n}}$, where z* is the upper $\frac{1-C}{2}$ critical value from N(0,1).

#### The Requirements

In order for this to work, we need a random sample, and the distribution of $\stackrel{_}{x}$ must be approximately normal (with a known ${\sigma }_{X}$ ).

### An Example

[1.] In 1997, the price of history books had a standard deviation of $7.61. A random sample of 40 history books has a mean of$46.93. Let’s construct a 99% confidence interval for the true mean price of a history book (In 1997).

First of all, we’d better check the requirements: we need a random sample, and the distribution of $\stackrel{_}{x}$ must be approximately normal. We’re told that the sample was random. Since the sample size is large (40), the CLT tells us that the distribution of $\stackrel{_}{x}$ will be approximately normal. So, we are justified in using these procedures to construct an interval.

$\stackrel{_}{x}$ = 46.93, ${\sigma }_{X}$ = 7.61, and n = 40. C = 0.99 means that z* will be the Z value with area $\frac{1-0.99}{2}$ to the right. A quick check of the normal table (or a few keystrokes) finds the value of z* is 2.5758.

So the interval is $\stackrel{_}{x}±{z}^{*}\frac{{\sigma }_{x}}{\sqrt{n}}$ = $46.93±2.5758\frac{7.61}{\sqrt{40}}$ = 46.93 ± 3.0993 = (43.8306, 50.0294).

I am 99% confident that the true mean price of a history book (in 1997) is between $43.83 and$50.03.

## Test a Hypothesis

### The Idea

If you’re not trying to estimate ${\mu }_{X}$, then the other type of inference you can conduct is to test a guess about the value of ${\mu }_{X}$. This begins with a guess (the hypothesis), which leads to a probability calculation (how unusual are my results, if my hypothesis is correct?), and finally, a conclusion.

### Hypotheses

OK, so it actually begins with a pair of hypotheses. The first hypothesis—the Null Hypothesis—is your guess about the value of ${\mu }_{X}$. This hypothesis should always reflect the idea that nothing has happened; nothing has changed; there is no difference; etc. A null hypothesis might look like this: H0${\mu }_{X}$ = 100.

In addition, you must state an Alternative Hypothesis. This typically states the kind of evidence that you are looking for—evidence that the real mean is higher, or lower (or simply different) than the value from the null hypothesis. An alternative hypothesis might look like this: Ha${\mu }_{X}$ > 100 (if you’re looking for evidence that the population mean is greater than 100).

Note that some people use H1 for the alternative. It’s the same thing.

In addition to the statement of the value, you should provide a verbal description of the hypotheses. See the example later for this.

### Type of Test

Your choice of test depends primarily (but not exclusively) on the parameter that appears in the hypotheses. For now, there is only one—the Z test for Means. More on this later.

### Requirements

Most tests depend on knowing some things about the sampling distribution of the statistic. For the Z Test for Means, the things you need are:

[1] A random sample;

[2] A normal distribution in the population (or at least in the distribution of $\stackrel{_}{x}$ ).

Be sure to state what the requirements are, and then check them. See the example later for more on this.

### Significance

At this point in the test, you haven’t collected any data (or in AP Stats, haven’t looked at any). Before you go and measure a sample, you have to decide how much evidence it’s going to take to make you believe that your null hypothesis is wrong. Will the slightest deviation from your assumption make you believe that the assumption is wrong? Will it take a mountain of evidence to convince you that the assumption is wrong? The amount of evidence that you think you need is called the Level of Significance. It is expressed as a percentage (or proportion, or probability). The lower the level of significance (denoted α), the more evidence it’s going to take to convince you that the null hypothesis is wrong.

You don’t yet have any idea of how to set a value for α; don’t worry. For now, just set it at 0.05 (0.20 is quite high; 0.01 is very low). More on this later.

### Calculations

At this point, you would go out and collect the data. Of course, we’ll just be taking the data that are given to us. The next step is to calculate how unusual your sample results are.

In particular, you now have a value of $\stackrel{_}{x}$, and it’s probably not equal to your assumed value of ${\mu }_{X}$. We expect this; variation is inevitable. But how unusual is this value of $\stackrel{_}{x}$ ? How often should you get a value of $\stackrel{_}{x}$ that far (or farther) from the assumed value of ${\mu }_{X}$ ?

We learned how to calculate probabilities for $\stackrel{_}{x}$ in a previous chapter—and that’s exactly what we’re going to do now. The result of this is called the p-value (or attained significance).

#### Greater Than / Less Than

If the alternate hypothesis uses < or >, then that symbol indicates the direction in which the area should be found.

For example, if your hypotheses are H0${\mu }_{X}$ = 100 vs. Ha${\mu }_{X}$ > 100, and you get a value of $\stackrel{_}{x}$ of 101.5, then the probability you must calculate is P( $\stackrel{_}{x}$ > 101.5).

Note that the actual value of $\stackrel{_}{x}$ doesn’t determine the direction—if $\stackrel{_}{x}$ = 99.2, you’d still find area to the right: P($\stackrel{_}{x}$ > 99.2).

#### Not Equal To

If the alternate uses ≠, then you have to do something different. In particular, you want to find the area in both directions—you want to find the probability of getting an $\stackrel{_}{x}$ value this far (or farther) from ${\mu }_{X}$ in either direction (higher or lower). Because of symmetry, you can simply find the area of one tail (away from the center), and double that value.

For example, if your hypotheses are H0${\mu }_{X}$ = 100 vs. Ha${\mu }_{X}$ ≠ 100, and your value of $\stackrel{_}{x}$ is 102.8, then the probability that you must calculate is P($\stackrel{_}{x}$ < 97.2) + P($\stackrel{_}{x}$ > 102.8) = 2P($\stackrel{_}{x}$ > 102.8).

If your value of $\stackrel{_}{x}$ is 90.9, then you will calculate P($\stackrel{_}{x}$ < 90.9) + P($\stackrel{_}{x}$ > 109.1) = 2P($\stackrel{_}{x}$ < 90.9).

#### The Rejection Region Method

Back in the days before calculators, tests were conducted with rejection regions—values of the test statistic (in this case, Z) that would lead to rejection of the level of significance. This is beginning to fall by the wayside, but you may hear it mentioned somewhere (perhaps you may even read it in your textbook!).

To find a rejection region, work backwards from the level of significance (α) to a value of the test statistic (call it z*). Values of the test statistic that are farther from the hypothesized value of ${\mu }_{X}$ (i.e., values of z where z* is between ${\mu }_{X}$ and z) will lead to rejection of the null hypothesis. We will be doing a lot of this in the final section of this chapter.

### Conclusion

Now that you have a measure of how unusual your sample was, it’s time to decide whether or not your assumption seems reasonable—whether or not you’ve got enough evidence to reject the null hypothesis (also recall that low p-values mean you’ve got more evidence).

So, compare the p-value to the significance level. If the p-value is lower, then you’ve enough evidence (maybe more than you need) to conclude that the assumed value of ${\mu }_{X}$ is incorrect. If the p-value is higher, then you don’t have enough evidence to conclude that the assumed value of ${\mu }_{X}$ is incorrect.

If you have enough evidence—if you reject the null hypothesis—then you should state that the alternate hypothesis is probably correct. If you don’t have enough evidence—if you fail to reject the null hypothesis—then you should state that the null hypothesis is probably correct. In either case, you need to say this in the context of the problem.

For example, if you are testing whether kids in your area are smarter than average (higher IQ), your hypotheses are H0${\mu }_{X}$ = 100 (kids in this area are average) vs. Ha${\mu }_{X}$ > 100 (kids in this area are smarter than average). Suppose you choose α = 0.05, but the p-value is 0.01. This is enough evidence to reject the null hypothesis; there is evidence that kids in your area are smarter than average.

Ultimately, what we are doing here is determining if our sample results are likely to have arisen by chance alone. If something is unlikely to have arisen by chance alone, then it is significant.

### A Test of Significance for the Population Mean

[1] Hypotheses. H0${\mu }_{X}$ = μ0. The "sub zero" indicates that this isn’t a variable; it’s a number that we don’t know yet (there is a difference!). Ha${\mu }_{X}$ ? μ0, where ? can be <, >, or ≠.

[2] Identify the Test (for now, Z-Test for Means).

[3] Requirements (State and Check): Random Sample; Normal Population (or at least a normal distribution of $\stackrel{_}{x}$ ).

[4] Choose/State Level of Significance

[5] Mechanics (show your calculations of the p-value).

[6] Conclusion.

### An Example

[2.] European Officials have set a limit on the amount of cadmium (a heavy metal, which is toxic) that is permitted in mushrooms sold in the region—in particular, mushrooms with cadmium concentrations greater than 0.5ppm (parts per million) are not allowed. A mushroom farmer is wondering if he can sell his crop. He tests a random sample of mushrooms and measures the amount of cadmium—the mean of his 12 observations is 0.5258. Assuming that cadmium levels vary normally with standard deviation 0.37ppm, does the farmer have evidence (at the 5% level) that his mushrooms are safe to eat?

Let X represent the cadmium concentration in a single mushroom, and $\stackrel{_}{x}$ represent the mean concentration in 12 mushrooms.

H0${\mu }_{X}$ = 0.5 (the mushrooms are safe); Ha${\mu }_{X}$ > 0.5 (the mushrooms are unsafe).

This calls for a Z-Test for Means.

This test requires a random sample and a normal distribution in the population. We are told that the sample is random, and we are told that the population has a normal distribution. We may continue.

We are told to test this at the 5% level.

The p-value is P($\stackrel{_}{x}$ > 0.5258). X has a normal distribution, so $\stackrel{_}{x}$ does, too. Thus, we may standardize. $z=\frac{0.5258-0.5}{\frac{0.37}{\sqrt{12}}}$ ≈ 0.2416. So P($\stackrel{_}{x}$ > 0.5258) ≈ P(Z > 0.2416) ≈ 0.4046.

If the mean cadmium concentration is 0.5, then we can expect to get sample mean concentration of 0.5258 or higher in 40.46% of samples. This happens often enough to attribute to chance at the 5% level; it is not significant, and we fail to reject the null hypothesis.

It appears that these mushrooms are safe to eat.

## Errors in Tests of Significance

When you conduct a test of significance, you’re making an assumption, and then deciding whether or not that assumption appears to be true. Alas, we never know if our assumption is true or not—thus, we might be making an error when we reject/fail to reject the null hypothesis.

### Type I Errors

A Type I Error occurs when we reject a true hypothesis—in other words, we reject the null hypothesis, but the hypothesis is really (unbeknownst to us) true.

The probability of a Type I Error is easy to calculate—if you’ve decided that your level of significance is 5%, then you will reject the Null Hypothesis if the p-value of the sample mean is less than 5%. How often should you get sample means with p-values less than 5%? Why, in 5% of samples, of course! So the probability of a Type I Error is the same as the level of significance—α.

### Type II Errors

A Type II Error occurs when we fail to reject a false hypothesis—in other words, we fail to reject the null hypothesis, when the value of the parameter (μX) is really some other value (not the one we assumed).

The probability of a Type II Error is more difficult to calculate—it depends on the actual value of ${\mu }_{X}$, which could be anything. Thus, we must calculate the probability of a Type II Error against a specific alternative, ${\mu }_{a}$. To do this, we must first convert our level of significance into an actual value of $\stackrel{_}{x}$. Then, we must find the area to one side of that $\stackrel{_}{x}$ (the side depends on what kinds of $\stackrel{_}{x}$ values would lead to rejection). The symbol for the probability of a Type II Error is β.

For Example: Let’s assume that our alternate hypothesis is >.

Step 1: Convert α…

… to $\stackrel{_}{x}$.

Step 2: Assuming that the parameter actually has value μa, find the probability of failing to reject.

### The Power of a Test

The Power of a test is the probability that you reject a false hypothesis—in other words, you reject the null hypothesis when the null is false (the population mean is actually some other value). Since this is calculated when the null hypothesis is false, you need an alternate (μa) to measure against. In fact, Power is the complement of β.

### Examples

The manufacturer of a new car claims that it gets 26 miles per gallon (mpg). A consumer group is skeptical of this claim and plans to test H0${\mu }_{X}$ = 26 vs. Ha${\mu }_{X}$ < 26 at the 5% level of significance on a sample of 30 cars. Assume that the standard deviation of this kind of measurement is 1.4mpg.

[3.] What is the probability of a Type I Error?

The probability of a Type I Error is the same as the level of significance—0.05.

[4.] What is the probability of a Type II Error vs. ${\mu }_{a}$ = 25.8?

This will be a bit more complicated. First, I need to convert α into a value of $\stackrel{_}{x}$. Since the alternate hypothesis uses <, I’m going to reject the null hypothesis only if $\stackrel{_}{x}$ is too low—below, or to the right, of 26. I need to find that value of $\stackrel{_}{x}$ that has a left-hand area (away from 26) of 0.05.

Of course, we did that before. Do an Inverse Normal calculation to find the $\stackrel{_}{x}$ value is 25.5796.

Now we need to find out how often we’ll fail to reject (how often $\stackrel{_}{x}$ will be greater than 25.5796) if ${\mu }_{X}$ = ${\mu }_{a}$ = 25.8.

So, standardize 25.5796! $z=\frac{25.5796-25.8}{\frac{1.4}{\sqrt{30}}}$ = -0.8624. P($\stackrel{_}{x}$ > 25.5796) = P(Z > -0.8624) = 0.8058. There is an 80.58% chance of a Type II Error.

[5.] What is the power of the test vs. ${\mu }_{a}$ = 25.8?

The power of the test vs. ${\mu }_{a}$ = 25.8 is 1 - β = 1 - 0.8058 = 0.1942.

Page last updated 2012-04-01