Print this Page
9.6 – Two-Variable Data
- Correlation – A measure of the strength of the relationship between two variables.
- Correlation coefficient – A number that expresses the strength of the correlation between two variables.
- It also shows whether the correlation is positive or negative.
- The correlation coefficient is called r.
- Dependent Variable – The variable plotted on the y-axis. It is also called the response variable.
- The dependent variable responds to changes in the explanatory variable.
- Explanatory Variable – The variable plotted on the x-axis, also called the independent variable.
- In an experiment, the explanatory variable is the variable that is being studied.
- Independent Variable – The variable plotted on the x-axis, also called the explanatory variable.
- In an experiment, the independent variable is the variable that is being studied.
- Line of Best Fit – A line drawn as near as possible to all the points in a scatterplot.
- The line of best fit helps you see the relationship shown in the scatterplot.
- It is also called a least squares regression line (LSRL).
- Residual – The difference between an observed value and the value predicted by the least squares regression line.
- Response Variable – The variable plotted on the y-axis; also called the dependent variable.
- The response variable responds to changes in the explanatory variable.
- Two-Variable Data – Data that can be measured in two different ways and graphed on a Cartesian plane.
- Visual displays for one-variable data sets
- dot plot
- stem-and-leaf plot
- box-and-whisker plot
- frequency table
- A correlation is the measure of the strength of the relationship between two variables.
- It can be described with a number — the correlation coefficient (r)
- Height and weight of 15 men
- Population and gross domestic product (GDP) of 15 European Union (EU) countries
- Runs scored and number of wins for 15 national league (NL) baseball teams
|Correlation Coefficient (r)
- The correlation coefficient is called r. is sometimes called Pearson’s r because it was developed by a statistician named Karl Pearson.
- A number that describes the relationship between two variables.
- Measures the strength of the relationship.
- Tells whether the relationship is positive or negative.
- It is always between -1 and 1.
- When r is near 0, it indicates very little correlation.
- When r is near 1, it indicates a strong positive correlation.
- When r is near -1, it indicates a strong negative correlation.
- r is strongly affected by outliers.
- r applies only to linear correlations.
- Measuring Strength
- Perfect – The data points fall into a line.
- Strong – The data points form a tight cluster but do not quite fall into a line.
- Weak – The overall trend of the data is in one direction, but the points do not form a tight cluster.
- Estimating “r”
- r will be positive if there is a positive linear relationship (the values go up from left to right).
- r will be negative if there is a negative linear relationship (the values go down from left to right).
- r will be close to -1 or +1 when the points are all close to being on one line.
- r will be close to 0 when the points are not close to being on one line (there is no linear pattern).
- r will be a perfect +1.0 or -1.0 when one line contains all the points.
|Correlation Does Not Imply Causation
- Example, suppose a scatterplot shows that there is a strong positive correlation between the number of televisions owned and the number of well-fed people in a country.
- Does owning a TV cause a person to be well fed?
- What’s really happening is that in a country where everyone has a TV, they can also afford food.
- TVs alone don’t cause people to be well fed.
- Predictions based on correlations are not necessarily true.
- They are only likely to occur, based on observed trends in past and present data.
- This is the way most weather predictions are made
- The best way to display two-variable data.
- It plots the two variables as (x, y) pairs on the Cartesian plane.
- The suspected cause of that relationship is called the explanatory variable. It is the x-axis.
- The suspected effect is called the response variable. It is the y-axis.
- Look for patterns in a scatterplot by studying three features
- Measuring Direction
- Positive correlation: Data appear to go up from left to right across the scatterplot.
- Negative correlation: Data appear to go down from left to right across the scatterplot.
- No correlation: Data are spread out across the scatterplot with no visible pattern.
- The population/GDP scatterplot below shows a strong pattern — almost a perfect line! This means it is likely that GDP really does depend on population.
- The runs/wins scatterplot shows a weak pattern. This means it is unlikely that wins depend on runs.
|Least Squares Regression Line (LSRL)
- A line drawn as near as possible to the points in a scatterplot
- Helps you see the linear relationship between the two variables on the scatterplot
- Also called the line of best fit
- Equation for Least Squares Regression Line
- : response variable
- a: y-intercept of line
- b: slope of line
- x: explanatory variable
|Slope of a Regression Line
- To find b, you use the following formula, where r is the correlation coefficient, is the standard deviation of the y-values, and is the standard deviation of the x-values: Formula:
- You need to find the standard deviation to find the slope
- To find the deviation from the mean
- Find the mean
- Find each value’s deviation
- Square each deviation
- Add up the squares
- Divide the result by n – 1
- Take the square root
- Example: r = -0.92 and the points plotted are: (10, 120), (20, 40), (15, 80), (5, 160), (10, 80), (25, 35)
- Mean of x and y
- Mean of x:
- Mean of y:
- Standard deviation of x and y
- Standard Deviation of x:
- Standard Deviation of y:
- Regression Line formula for Slope
- So far, the equation will be:
- To find the y-intercept, use this formula:
- We know , , and
- Answer: the regression line is:
|How to Analyze Two-Variable Data
- Collect data
- Display the data on a scatterplot
- Identify the correlation
- Consider factors of causation
- Find the correlation coefficient
- Write the equation of the line of best fit
- Use the equation to make predictions
|All Three Formulas in One
Permanent link to this article: http://newvillagegirlsacademy.org/math/?page_id=4344