Print this Page
9.6 – TwoVariable Data
Key Terms
 Correlation – A measure of the strength of the relationship between two variables.
 Correlation coefficient – A number that expresses the strength of the correlation between two variables.
 It also shows whether the correlation is positive or negative.
 The correlation coefficient is called r.
 Dependent Variable – The variable plotted on the yaxis. It is also called the response variable.
 The dependent variable responds to changes in the explanatory variable.
 Explanatory Variable – The variable plotted on the xaxis, also called the independent variable.
 In an experiment, the explanatory variable is the variable that is being studied.
 Independent Variable – The variable plotted on the xaxis, also called the explanatory variable.
 In an experiment, the independent variable is the variable that is being studied.
 Line of Best Fit – A line drawn as near as possible to all the points in a scatterplot.
 The line of best fit helps you see the relationship shown in the scatterplot.
 It is also called a least squares regression line (LSRL).
 Residual – The difference between an observed value and the value predicted by the least squares regression line.
 Response Variable – The variable plotted on the yaxis; also called the dependent variable.
 The response variable responds to changes in the explanatory variable.
 TwoVariable Data – Data that can be measured in two different ways and graphed on a Cartesian plane.
Review
OneVariable Data 
 Visual displays for onevariable data sets
 dot plot
 stemandleaf plot
 boxandwhisker plot
 histogram
 frequency table

Notes
Correlations 
 A correlation is the measure of the strength of the relationship between two variables.
 It can be described with a number — the correlation coefficient (r)

 Examples
 Height and weight of 15 men
 Population and gross domestic product (GDP) of 15 European Union (EU) countries
 Runs scored and number of wins for 15 national league (NL) baseball teams

Correlation Coefficient (r) 
 The correlation coefficient is called r. is sometimes called Pearson’s r because it was developed by a statistician named Karl Pearson.
 Defined
 A number that describes the relationship between two variables.
 Measures the strength of the relationship.
 Tells whether the relationship is positive or negative.

 Properties
 It is always between 1 and 1.
 When r is near 0, it indicates very little correlation.
 When r is near 1, it indicates a strong positive correlation.
 When r is near 1, it indicates a strong negative correlation.
 r is strongly affected by outliers.
 r applies only to linear correlations.
 Measuring Strength
 Perfect – The data points fall into a line.
 Strong – The data points form a tight cluster but do not quite fall into a line.
 Weak – The overall trend of the data is in one direction, but the points do not form a tight cluster.
 Estimating “r”
 r will be positive if there is a positive linear relationship (the values go up from left to right).
 r will be negative if there is a negative linear relationship (the values go down from left to right).
 r will be close to 1 or +1 when the points are all close to being on one line.
 r will be close to 0 when the points are not close to being on one line (there is no linear pattern).
 r will be a perfect +1.0 or 1.0 when one line contains all the points.

Correlation Does Not Imply Causation 
 Example, suppose a scatterplot shows that there is a strong positive correlation between the number of televisions owned and the number of wellfed people in a country.
 Does owning a TV cause a person to be well fed?
 What’s really happening is that in a country where everyone has a TV, they can also afford food.
 TVs alone don’t cause people to be well fed.
 Predictions based on correlations are not necessarily true.
 They are only likely to occur, based on observed trends in past and present data.
 This is the way most weather predictions are made

Scatterplots 
 The best way to display twovariable data.
 It plots the two variables as (x, y) pairs on the Cartesian plane.
 The suspected cause of that relationship is called the explanatory variable. It is the xaxis.
 The suspected effect is called the response variable. It is the yaxis.
 Look for patterns in a scatterplot by studying three features

 Measuring Direction
 Positive correlation: Data appear to go up from left to right across the scatterplot.
 Negative correlation: Data appear to go down from left to right across the scatterplot.
 No correlation: Data are spread out across the scatterplot with no visible pattern.

 Examples
 The population/GDP scatterplot below shows a strong pattern — almost a perfect line! This means it is likely that GDP really does depend on population.
 The runs/wins scatterplot shows a weak pattern. This means it is unlikely that wins depend on runs.

Least Squares Regression Line (LSRL) 
 A line drawn as near as possible to the points in a scatterplot
 Helps you see the linear relationship between the two variables on the scatterplot
 Also called the line of best fit

 Equation for Least Squares Regression Line

 : response variable
 a: yintercept of line
 b: slope of line
 x: explanatory variable

Slope of a Regression Line 
 To find b, you use the following formula, where r is the correlation coefficient, is the standard deviation of the yvalues, and is the standard deviation of the xvalues: Formula:
 You need to find the standard deviation to find the slope
 To find the deviation from the mean
 Find the mean
 Find each value’s deviation
 Square each deviation
 Add up the squares
 Divide the result by n – 1
 Take the square root
 Example: r = 0.92 and the points plotted are: (10, 120), (20, 40), (15, 80), (5, 160), (10, 80), (25, 35)
 Mean of x and y
 Mean of x:
 Mean of y:
 Standard deviation of x and y
 Formula:
 Standard Deviation of x:
 Simplify:
 Answer:
 Standard Deviation of y:
 Simplify:
 Answer:
 Regression Line formula for Slope
 So far, the equation will be:
 To find the yintercept, use this formula:
 We know , , and
 So,
 Answer:
 Answer: the regression line is:

How to Analyze TwoVariable Data 
 Collect data
 Display the data on a scatterplot
 Identify the correlation
 Consider factors of causation
 Find the correlation coefficient
 Write the equation of the line of best fit
 Use the equation to make predictions

All Three Formulas in One 

Important!
Practice (Apex Study 9.6)
Permanent link to this article: http://newvillagegirlsacademy.org/math/?page_id=4344