Scatterplots and Regressions (page 3 of 4)

The point of collecting data and plotting the collected values is usually to try to find a formula that can be used to model a (presumed) relationship. I say "presumed" because the researcher may end up concluding that there isn't really any relationship where he'd hoped there was one. For instance, you could run experiments timing a ball as it drops from various heights, and you would be able to find a definite relationship between "the height from which I dropped the ball" and "the time it took to hit the floor". On the other hand, you could collect reams of data on the colors of people's eyes and the colors of their cars, only to discover that there is no discernable connection between the two data sets.

The process of taking your data points and coming up with an equation is called "regression", and the graph of the "regression equation" is called "the regression line". If you're doing your scatterplots by hand, you may be told to find a regression equation by putting a ruler against the first and last dots in the plot, drawing a line, and guessing the line's equation from the picture. This is an incredibly clumsy way to proceed, and can give very wrong answers, especially since values at the ends often turn out to be outliers (numbers that don't quite fit with everything else).

 For instance, suppose your dots look like this: Connecting the first and last points, you would end up with this: On the other hand, you could ignore the outliers and instead just eyeball the cloud of dots to locate a general trend. Put the ruler about where you think a line ought to go (regardless of whether the ruler actually crosses any of the dots), draw the line, and guess the equation from that. You'll likely end up with a more sensible result. Your equation will still be guess-work, but it'll be better guess-work than using only the first and last points:

If you're finding regression equations with a ruler, you'll need to work extremely neatly, of course, and using graph paper would probably be a really good idea. Once you've drawn in your line (and this will only work for linear, or straight-line, regressions), you will estimate two points on the line that seem to be close to where the gridlines intersect, and then find the line equation through those two points. From the above graph, I would guess that the line goes close to the points (3, 7) and (19, 1), so the regression equation would be y = (–3/8)x + 65/8.

Most likely, though, you'll be doing regressions in your calculator. Doing regressions properly is a difficult and technical process, but your graphing calculator has been programmed with the necessary formulas and has the memory to crunch the many numbers. The calculator will give you "the" regression line. If you're working by hand, you and your classmates will get slightly different answers; if you're using calculators, you'll all get the same answer. (Consult your owners manual or calculator web sites for specific information on doing regressions with your particular calculator model.)

If you're supposed to report how "good" a given regression is, then figure out how to find the "r", "r2", and/or "R2" values in your calculator. These diagnostic tools measure the degree to which the regression equation matches the scatterplot. The closer these correlation values are to 1 (or to –1), the better a fit your regression equation is to the data values. If the correlation value is more than 0.8 or less than –0.8, the match is judged to be pretty good; if the value is between –0.5 and 0.5, the match is judged to be pretty poor; and a correlation value close to zero means you're kidding yourself if you think there's really a relationship of the type you're looking for. (There should be instructions, somewhere in your owners manual, for finding this information.) When you're doing a regression, you're trying to find the "best fit" line to the data, and the correlation numbers help you to tell how good your "fit" is.

• Given the following data values, find the linear and cubic regression lines.
Say which regression is a better fit, and why.

•  (2, 23), (3, 24), (8, 32), (10, 36), (13, 51), (14, 59), (17, 76), (20, 107), (22, 120), (23, 131), (27, 182)

After plugging these values into the STAT utility of my calculator, I can then do a linear regression:

The line looks a little curvy on the scatterplot, so it's reasonable that the curvy line, the cubic y = 0.000829x3 + 0.23x2 – 1.09x + 24.60, is a better fit to the data points than the straight-line linear model y = 6.03x – 10.64.

Since the correlation value is closer to 1 for the cubic and since the graph of the cubic model is closer to the dots, the cubic equation y = 0.000829x3 + 0.23x2 – 1.09x + 24.60 is the better regression.

You shouldn't expect, by the way, always to get correlation values that are close to "1". If they tell you to find, say, the linear regression equation for a data set, and the correlation factor is close to zero, this doesn't mean that you've found the "wrong" linear equation; it only means that a linear equation probably wasn't a good model to the data. A quadratic model, for instance, might have been better.

<< Previous  Top  |  1 | 2 | 3 | 4  |  Return to Index  Next >>

 Cite this article as: Stapel, Elizabeth. "Scatterplots and Regressions." Purplemath. Available from     http://www.purplemath.com/modules/scattreg3.htm. Accessed [Date] [Month] 2016

MathHelp.com Courses
This lesson may be printed out for your personal use.