]> Understanding Basic Statistics: Chapter 7

7 Normal Distributions

In Chapter 6, we looked at the distributions of discrete random variables—in particular, the binomial. Now we turn out attention to continuous random variables—in particular, the normal.

Since this concerns a continuous variable, we can't make a list of values for the distribution…we can only look at a graph that shows what values are possible (and how probable they are).

7.1 Graphs of Normal Probability Distributions

The Normal Curve

The most famous (and useful) continuous random variable is the normal. A typical picture of the distribution of a normal variable (often called a Normal Curve) will look like this…

An Image

…but all of the following are distributions of normal variables:

An Image
(taller!)

An Image
(flatter!)

Properties of the Normal Curve

There are two values of interest for any normal curve: the mean (μ), and the standard deviation (σ). The mean will be in the center, and the standard deviation is a little harder to locate…

The distribution of a normal random variable will be symmetric around the mean.

The area underneath a normal curve is 1. At first, this might bother you—you've found areas under certain curves (circles), but not strange-looking ones like this. To do it from scratch, you'd need calculus. Naturally, we'll use an easier method.

On second thought, you might wonder "What does it matter? Who cares what area lies under that curve?" It turns out that the area under the curve tells us how often certain values of the variable occur—remember, that's what a probability distribution does: what values can occur; how often do these values actually occur?

The Empirical Rule

So, now for a non-calculus way to find areas under a normal curve—the Empirical Rule (note that these probabilities are approximate).

There is a 68% probability of obtaining a value between μ - σ and μ + σ.

There is a 95% probability of obtaining a value between μ - 2σ and μ + 2σ.

There is a 99.7% probability of obtaining a value between μ - 3σ and μ + 3σ.

 

Don't forget about symmetry! For example, there is a 34% probability of getting a value between μ and μ + σ. Take careful note of diagram 7-5!

 

Examples:

[1.] Let X be a random normal variable with μ = 50 and σ = 10. What is the probability that X will take a value between 40 and 60?

40 = μ - σ and 60 = μ + σ. The empirical rule says that there is a 68% probability of obtaining a value between μ - σ and μ + σ—thus, the answer is 68%.

[2.] Let X be a random normal variable with μ = 100 and σ = 15. What is the probability that X will take a value between 115 and 130?

Perhaps a diagram might help with this.

An Image

The Empirical Rule tells us that 68% of the values should be between 85 (μ - σ) and 115 (μ + σ). Because of symmetry, that means that 34% of the values are between 100 (μ) and 115 (μ + σ).

We also know that 95% of the values are between 70 (μ - 2σ) and 130 (μ + 2σ). Symmetry gives us that 47.5% of values must be between 100 and 130.

The question asks for the percent of values between 115 and 130—that area is shaded in the diagram above. So, look carefully—from 100 to 130 has an area of 47.5%, and from 100 to 115 has an area of 34%. What's the area between 115 and 130?

47.5% - 34% = 13.5%.

 

[3.] One IQ test has scores with a normal distribution (μ = 100, σ = 15). What percent of IQ scores are between 115 and 130?

Hopefully, it is easy to see that this is really just the last problem, written as a word problem. The answer is 13.5%.

 

[4.] Human gestation periods (the time between conception and birth) are normally distributed with a mean of 266 days, and a standard deviation of 16 days. What percent of pregnancies last less than 218 days?

Note that 218 = μ - 3σ. The area between (μ - 3σ) and (μ + 3σ) is approximately 99.7%; thus, the area outside of that (less than μ - 3σ or greater than μ + 3σ) must be about 0.3%. Because of symmetry, each of these areas must be equal; thus, the area below μ - 3σ must be half of 0.3%, which is 0.15%—which is the answer to the question.

Control Charts

The Idea

Industrialization has resulted in many mass-produced items (cars, toilet paper, cell phones, microwave meals…). Part of a consumer's interest in a product is its consistency—does this hamburger taste like the last one? Does this stereo sound as good as that other one?

Unfortunately, perfect consistency is not possible—variation is unavoidable (this was one of the great scientific revolutions in the early 20th century). In other words, it is simply not possible to make a perfectly consistent product.

Since perfection is impossible, near perfection becomes the goal—in other words, we try to limit the amount of variation. Once people realized that statistics can measure variation, they began using statistical methods. One way of keeping track of variation is through control charts.

The Process

A Control Chart plots a variable on the y-axis, and time on the x-axis. The pattern of the points can tell if the process (the one which is resulting in these measurements) is in control—in other words, if the variation is consistent/acceptable.

The following patterns are not acceptable—they indicate that the variation is no longer in control.

[1] Any datum that is more than 3σ away from μ.

[2] Any nine or more consecutive data on the same side of μ.

[3] Any three consecutive data, where two of the data are more than 2σ away from μ.

Since we are looking at the position of the data in relation to μ and σ, we make five horizontal lines on the control chart: μ + 3σ, μ + 2σ, μ, μ - 2σ, μ - 3σ.

 

Examples:

[5.] Behind the glass of computer monitors, there is a very fine wire mesh. The wires in this mesh must be pulled tight, but not too tight, for the screen to function properly. The tension in the wires is measured by running an electric current through the wire. Screens that have the proper amount of tension should measure 275 mV (millivolts). The standard deviation of this type of measurement is 21.5 mV. Here are the readings on the last 20 monitors:

Monitor

1

2

3

4

5

6

7

8

9

10

mV

269.5

297

269.6

283.3

304.8

280.4

233.5

257.4

317.5

327.4

 

Monitor

11

12

13

14

15

16

17

18

19

20

mV

264.7

307.7

310

343.3

328.1

342.6

338.8

340.1

374.6

336.1

Is this process in control?

 

So the process has mean μ = 275 and standard deviation σ = 21.5. That means that we need horizontal lines at 339.5, 318, 275, 232, and 210.5. Plot those, and the data, to get…

An Image

Clearly, this process is out of control. It loses control at monitor #14, when the measured voltage was more than 3 standard deviations away from the mean.

 

[6.] The diameter of a bearing deflector in an electric motor is supposed to be 2.205 cm. The standard deviation of this kind of measurement is about 0.0005 cm. Here are the measurements of the last twelve bearings:

n

1

2

3

4

5

6

diameter

2.2047

2.2047

2.205

2.2049

2.2053

2.2043

 

n

7

8

9

10

11

12

diameter

2.2036

2.2042

2.2038

2.2045

2.2026

2.204

Is this process in control?

 

The process has mean μ = 2.205 and standard deviation σ = 0.0005. So, plot the data with lines at 2.2065, 2.206, 2.205, 2.204, and 2.2035.

An Image

This is out of control—look at bearings 7, 8 and 9. Two of the three are between μ - 2σ and μ - 3σ.

7.2 Standard Units and Areas Under the Standard Normal Distribution

So far, we've needed the Empirical Rule to determine how often certain values occur in a Normal distribution. This isn't good enough, though—we will often encounter measurements that aren't exactly one or two standard deviations away from the mean.

Read that again—there's something important there. The Empirical Rule can only be used if you know how many standard deviations away from the mean the value is.

How do we measure the number of standard deviations? With something called a standardized score (z-score):

Equation 1 - Standardized Score

z = x μ X σ X

Note that positive z-scores indicate that x is above the mean, and negative z-scores indicate that x is below the mean.

With z-scores, we can rewrite the Empirical Rule: 68% of the data lie between z = -1 and z = 1; 95% of the data lie between z = -2 and z = 2…

Also, note that with a little algebra, you can use this formula to take a given standardized score (z-score) and solve for x.

 

Examples:

[7.] If μ = 100 and σ = 15, then what is the z-score for x = 135?

z = 135 100 15 = 2.33

 

[8.] If μ = 500 and σ = 100, then what z-interval corresponds to x > 610?

z = 610 500 100 = 1.1 , so x > 610 is the same as z > 1.1.

 

[9.] If μ = 70 and σ = 2.5, then what is the value of x when z = -2.3?

2.3 = x 70 2.5 ( 2.3 ) ( 2.5 ) = x 70 ( 2.3 ) ( 2.5 ) + 70 = x = 64.25 .

 

Of course, we don't need standardized scores just to make the Empirical Rule simpler…we need them in order to find probabilities (areas under normal curves). The way to use z-scores is called a Standard Normal Distribution Table (Z-Table).

To use the table, first write the z-score with two digits to the right of the decimal (this means that you will normally write three digits for z). On the left column, find the first two digits of z (every digit except the last). Across the top row, find the last digit of z. At the intersection of that row and column is the area (probability) to the left of your z.

Warnings:

 

Examples:

[10.] Find the area to the left of z = 2.12.

Read this directly from the table—0.9830.

 

[11.] Find the area to the right of z = 0.10.

The table entry is 0.5397, but that's the area to the left—subtract from one to find the area to the right, which is 0.4602.

 

[12.] Find the area between z = -0.82 and z = 1.05.

This will be the difference in the table entries for these z-scores. z = -0.82 has an entry of 0.2061. z = 1.05 has an entry of 0.8531. The difference is 0.8531 - 0.2061 = 0.6470.

 

7.3 Areas Under Any Normal Curve

If you're given x, and need to find an area/probability, then you're set—have at it!

If you're given an area, and need to find a z, then you actually know what to do—just backwards! In other words, start by finding the left-hand area in the table; now look to the left (and up) to find z. Once you know z, use the formula for z (and a little algebra) to find x (if you need to). Some examples, perhaps?

 

Examples:

[13.] Find P(x ≥ 135) if μ = 100 and σ = 15.

z = 135 100 15 = 2.33 . x ≥ 135 is the same as z ≥ 2.33. The table entry for z = 2.33 is 0.9901—but that's left hand area; we want the area to the right! 0.0099 is the answer.

 

[14.] Find z so that 10.93% of the standard normal curve lies to the left of z.

Look in the table—what value of z has a left-hand area of 0.1093? z = -1.23.

 

[15.] Find z so that 38.97% of the standard normal curve lies to the right of z.

0.3897 to the right means 0.6103 to the left—z = 0.28.

 

[16.] Scores on the SAT-M are normally distributed with μ = 500 and σ = 100. What percent of scores are between 550 and 650?

z = 550 500 100 = 0.5 , and z = 650 500 100 = 1.5 . Scores between 550 and 650 have z-scores between 0.5 and 1.5. P(0.5 < z < 1.5) = 0.9332 - 0.6914 = 0.2418. About 24.18% of scores are between 550 and 650.

 

[17.] Scores on a particular IQ test are normally distributed with a mean of 100 and standard deviation of 15. What percent of scores are less than 80?

z = 80 100 15 = 1.67 . x < 80 is the same as z < -1.67. The table entry for this z is 0.0475. Thus, 4.75% of IQ scores are less than 80.

 

[18.] The heights of U.S. adults males are approximately normally distributed with mean 68.5 inches and standard deviation 1.72 inches. How tall must a man be in order to be one of the tallest 1% of men?

If you are in the tallest 1%, then you are taller than 99% of other men—so the area to the left of this height must be 0.99. Look in the chart—how close does the left-hand area get to 0.99? z = 2.33 is closest.  2.33 = x 68.5 1.72 ( 2.33 ) ( 1.72 ) + 68.5 = x = 72.5076 . A man must be taller than 72.5 inches to be one of the tallest 1%.


Valid MathML 2.0! Valid CSS Level 2!

Page last validated 2010-08-15