Covariance and correlation
Covariance and correlation are measures of dependence between two random variables based on their joint distribution. They quantify the tendency of values of the random variables to vary together, or to “co-vary”. They are signed measures, with the sign indicating whether they tend to vary in opposite directions (negative sign) or the same direction (positive sign).
Covariance
If
Linearity of expectation also entails that covariance is “bi-linear”, meaning it is linear in each argument:
Use bilinearity of covariance to show that:
for any constant
Let
To find the covariance, one needs the expectations
So:
- What is
? - What is
? - What is
?
Correlation
Observe that shifting a random vector by a constant will not change the covariance, but scaling will. For example, continuing the example immediately above, by bilinearity one has that
The correlation between
Continuing the previous example, the marginal variances are obtained by the following calculation:
Then, the correlation is:
In addition to being scale-invariant, correlation is easier to interpret since it must be a number between 0 and 1.
Lemma. Let
Denote the correlation by
Then consider the expression
This result establishes that the largest absolute values of a correlation are
Consider the random vector defined by the joint distribution given in the table below:
0.1 | 0.5 | |
0.3 | 0.1 |
First, consider whether you expect outcomes to be dependent, and if so, whether you expect a positive or negative covariance/correlation. Then compute the covariance and correlation.
Lastly, it is important to note that covariance and correlation do not capture every type of dependence, but rather only linear or approximately linear dependence. We will return to this later, but the classical counterexample is given below.
Let
$$ However, obviously