## [Stats] Results involving correlation

I have noticed that many of our students are not quite aware of mathematical proofs behind some very basic identities involving correlation (standard books on probability – including *that* *one* – of course, have the necessary proofs, but old-fashioned textbooks aren’t exactly all that popular, are they?).

I have in mind the three:

- Correlation coefficient always lies between minus and plus 1
- Perfect correlation between two variables implies a linear relationship between them
- The relationship between in a simple linear regression and the correlation coefficient

**1. Proof that **

(This pretty much follows Feller, Vol. 1, Chapter 9)

Normalize the random variables and to have mean 0 and standard deviation 1, as:

Covariance, , between the standardized random variables and is:

i.e. the covariance between the standardized random variables represents the correlation between the original random variables(the second term in the second equation above is zero because and ).

The trick now is to calculate the variance of the sum of the standardized random variables:

Since variance is always non-negative, we must have that or .

**2. A special case: **

When , the last equation reduces to , meaning = constant. Converting back to the original variables, this implies:

i.e. when the two random variables are perfectly positively or negatively correlated, they can be written as linear functions of each other. Alternatively, correlation captures a linear relationship between two random variables.

**3. **

Regression coefficient in a standard linear regression is given by:

with the given as:

The correlation coefficient between and is given by:

Noting that , it is a matter of simple algebra to verify that:

Putting the two together, and using the formula for the regression coefficient, , we have that:

(where in the second step above the term cancels off in numerator and denominator, simplifying the expression)

Finally, dividing both sides by gives the definition of on the LHS and on the RHS, i.e.:

(This result also has a more concise and general proof, but uses linear algebra. See, for example, here.)

## Leave a Reply