We are focusing on simple linear regression—however, not all bivariate relationships are linear. Some are curved…we will now look at how to straighten out two large families of curves.

Exponential functions are seen quite a bit out in the world—well, actually they only do a good job of modeling a part of what we see; in reality, nothing can follow an exactly exponential function. But I digress…

An exponential function is of the form $y={a}^{x}$. The variable *a* is called the base, and is typically restricted to be a positive real number. Notice
that *x* is the exponent. Below are a few examples of exponential function graphs.

The essential feature of an exponential graph is that it rises quite sharply on one end, and flattens out completely on the other.

Every point on an exponential function is of the form (*x*, *a ^{x}*). The simplest line would have points of the form (

*Logarithms*! Specifically, take the logarithm of the y-coordinate (the response variable).

Now, when I say *logarithm*, I mean *natural logarithm*, since that’s the most important kind. It is probably the case that you have been
taught to use *common logarithms*—which will work, but they’re just so…*common*…

If we take the logarithm of the response variable, then plot the new *y*-coordinates against the original *x*-coordinates, then the points will be of the form (*x*, *x*·ln(*a*)). This is akin to the graph with coordinates (*x*, *ax*), which is a line (with slope *a*).

If this new plot—using the transformed response variable—looks linear, then we can perform linear regression on it. The resulting regression equation will be $ln\left(\stackrel{\wedge}{y}\right)=a+bx$ , which takes in a value of the explanatory variable and spits out the log of the predicted response.

BE CAREFUL! If you want to use this equation to make a prediction, don’t forget to
un-transform the result of the equation! (take the value that it spits out as an
exponent on *e*)

The other common function out in the world is a power function—something of the form $y={x}^{a}$. Here are some examples:

The points on a graph like this look like(*x*, *x ^{a}*). Once again, the question is—how can we turn this into something more like (

Logarithms to the rescue! Again!

That will turn the points into (*x*, *a*·ln(*x*)). Unfortunately, this isn’t exactly what we want. We’ve gotten rid of the exponent,
but now we’ve got a log of *x*! What can we do now?

How about take the logarithm of the explanatory variable?

That would change the points to (ln(*x*), *a*·ln(*x*)). Take my word for it—this will be linear; now we can perform linear
regression.

Be careful with the equation! The regression equation will be
$ln\left(\stackrel{\wedge}{y}\right)=a+b\xb7ln\left(x\right)$, which takes in ln(*x*) as input, and produces the log of the predicted response.

For the AP exam, the only transformations that you must be able to exert on data are the exponential and power, described above. However, you should be aware that there are many more possibilities, and you must be able to deal with the equations that are generated by these transformations.

For example, perhaps you are told that a regression of $\sqrt{y}$ vs. *x* was performed, and the least squares line is
$\sqrt{y}=5.213-0.197x$. What is the prediction when *x* = 10?

Plugging in 10 for *x* produces a value of 3.428, but this isn’t the predicted
response—it’s the square root of the predicted response! You’ve got to square that to
get the actual prediction of 10.550.

[1.] A classic example—the length of a year for a planet, based on its distance from the sun. Here are the data:

Distance (millions of miles) |
Year (# of Earth-years) |

36 |
0.24 |

67 |
0.61 |

93 |
1 |

142 |
1.88 |

484 |
11.86 |

887 |
29.46 |

1784 |
84.07 |

2796 |
164.82 |

3666 |
247.68 |

First of all, let’s see that the data have a curved relationship.

Definitely curved. Don’t believe me? Look at the residuals.

Yeah—that’s bad.

But what transformation should be applied?

This can’t bottom off—that would force us to accept negative distances. Thus, the power transformation is indicated. Here is the new plot:

Looks pretty good. Linear, with strong positive association. There appears to be a gap between 5 and 6 (the asteroid belt!). The correlation is 1! 100% of the variation in the natural log of year length can be explained by the linear regression with the natural log of distance. Check that fit!

OK, we’d better check the residuals. One residual plot, coming up.

Nice and random. Now for a check of normality in the residuals.

Hmmm…perhaps I’ll look at the normal probability plot, too.

Not bad. Looks like we’ve got a model!

$ln\left(\stackrel{\wedge}{y}\right)=-6.8046+1.5008ln\left(x\right)$, where
$\stackrel{\wedge}{y}$ is the predicted year length, and *x* is the distance from Sol.

So—let’s use this model to predict the year length of a planet that doesn’t exist. The halfway point between Mars and Jupiter is around 313 million miles from Sol. What will this model predict for a year length if a planet occupied this position?

Letting *x* = 313, we get

$ln\left(\stackrel{\wedge}{y}\right)=-6.8046+1.5008ln\left(313\right)$

$ln\left(\stackrel{\wedge}{y}\right)=-6.8046+1.5008\left(5.7462\right)$

$ln\left(\stackrel{\wedge}{y}\right)=-6.8046+8.6238$

$ln\left(\stackrel{\wedge}{y}\right)=1.8192$

$\stackrel{\wedge}{y}={e}^{1.8192}=6.167$

So our model predict a year length of 6.167 years.

(in fact, the asteroid Ceres orbits at a distance of about 418 million miles, and
has a period of 4.6 years…oops? Well, Ceres isn’t a *planet*).

Page last updated 2015-05-13