Covariance and correlation

Course notes

STAT425, Fall 2023

Archived

December 15, 2023

Covariance and correlation are measures of dependence between two random variables based on their joint distribution. They quantify the tendency of values of the random variables to vary together, or to “co-vary”. They are signed measures, with the sign indicating whether they tend to vary in opposite directions (negative sign) or the same direction (positive sign).

Covariance

If $X_{1}, X_{2}$ are random variables then the covariance between them is defined as the expectation: $cov (X_{1}, X_{2}) = E [(X_{1} - E X_{1}) (X_{2} - E X_{2})]$ The expectation is computed from the joint distribution of $(X_{1}, X_{2})$ , so for instance if the random vector is discrete: $cov (X_{1}, X_{2}) = \sum_{x_{1}} \sum_{x_{2}} (x_{1} - E X_{1}) (x_{2} - E X_{2}) P (X_{1} = x_{1}, X_{2} = x_{2})$ And if the random vector is continuous: $cov (X_{1}, X_{2}) = \int \int (x_{1} - E X_{1}) (x_{2} - E X_{2}) f (x_{1}, x_{2}) d x_{1} d x_{2}$ It is immediate that covariance is a symmetric operator, i.e., $cov (X_{1}, X_{2}) = cov (X_{2}, X_{1})$ . Additionally, by expanding the product and applying linearity of expectation one obtains the covariance formula: $cov (X_{1}, X_{2}) = E (X_{1} X_{2}) - E X_{1} E X_{2}$ This provides a convenient way to calculate covariances, much in the same way that the variance formula simplifies calculation of variances.

Linearity of expectation also entails that covariance is “bi-linear”, meaning it is linear in each argument: $cov (a X_{1} + b, X_{2}) = a cov (X_{1}, X_{2}) + cov (b, X_{2})$ It is easy to show, however, that $cov (b, X_{2}) = 0$ : $cov (b, X) = E [(b - E b) (X - E X)] = E [\underset{0}{\underset{⏟}{(b - b)}} (X - E X)] = 0$ Intuitively, this makes sense, since constants don’t vary at all. Lastly, notice that $cov (X, X) = var (X)$ .

Exercise

Use bilinearity of covariance to show that:

$var (c) = 0$ for any constant $c$
$var (a X + b) = a^{2} var X$

Example: calculating a covariance

Let $(X_{1}, X_{2})$ be a continuous random vector distributed on the unit square according to the density: $f (x_{1}, x_{2}) = x_{1} + x_{2}, (x_{1}, x_{2}) \in (0, 1) \times (0, 1)$

To find the covariance, one needs the expectations $E X_{1} X_{2}$ , $E X_{1}$ , $E X_{2}$ . Marginally, $X_{1}$ and $X_{2}$ have the same distribution, so the calculation will be shown only for $X_{1}$ : $\begin{aligned} f_{1} (x_{1}) & = \int_{0}^{1} (x_{1} + x_{2}) d x_{2} = x_{1} + \frac{1}{2}, x_{1} \in (0, 1) \\ E X_{1} & = \int_{0}^{1} x_{1} (x_{1} + \frac{1}{2}) d x_{1} = \frac{7}{12} \\ E X_{2} & = E X_{1} = \frac{7}{12} \end{aligned}$ Then: $\begin{aligned} E X_{1} X_{2} & = \int_{0}^{1} \int_{0}^{1} x_{1} x_{2} (x_{1} + x_{2}) d x_{1} d x_{2} \\ = \int_{0}^{1} \int_{0}^{1} (x_{1}^{2} x_{2} + x_{1} x_{2}^{2}) d x_{1} d x_{2} \\ = \int_{0}^{1} \int_{0}^{1} x_{1}^{2} x_{2} d x_{1} d x_{2} + \int_{0}^{1} \int_{0}^{1} x_{1} x_{2}^{2} d x_{1} d x_{2} \\ = 2 \int_{0}^{1} \int_{0}^{1} x^{2} y d x d y \\ = 2 \int_{0}^{1} \frac{1}{2} x^{2} d x \\ = \frac{1}{3} \end{aligned}$

So: $cov (X_{1}, X_{2}) = E X_{1} X_{2} - E X_{1} E X_{2} = \frac{1}{3} - {(\frac{7}{12})}^{2} = - \frac{1}{144}$ Check your understanding

What is $cov (- X_{1}, X_{2})$ ?
What is $cov (X_{2}, X_{1})$ ?
What is $cov (3 X_{1} - 2, 5 X_{2} + 1)$ ?

Correlation

Observe that shifting a random vector by a constant will not change the covariance, but scaling will. For example, continuing the example immediately above, by bilinearity one has that $cov (10 X_{1}, 10 X_{2}) = - \frac{100}{144}$ . While this is a substantially larger number, intuitively, the scale transformation shouldn’t alter the dependence between $X_{1}, X_{2}$ — if $X_{1}, X_{2}$ are only weakly dependent, then $10 X_{1}, 10 X_{2}$ should remain weakly dependent. Correlation is a standardized covariance measure that is scale-invariant.

The correlation between $X_{1}, X_{2}$ is the covariance scaled by the variances: $corr (X_{1}, X_{2}) = \frac{cov (X_{1}, X_{2})}{\sqrt{var (X_{1}) var (X_{2})}}$ This measure is scale invariant since it is a symmetric operator and $var (a X_{1}) = a^{2} var (X_{1})$ , so: $corr (a X_{1}, X_{2}) = \frac{a cov (X_{1}, X_{2})}{\sqrt{a^{2} var (X_{1}) var (X_{2})}} = \frac{cov (X_{1}, X_{2})}{\sqrt{var (X_{1}) var (X_{2})}} = corr (X_{1}, X_{2})$

Example: computing correlation

Continuing the previous example, the marginal variances are obtained by the following calculation: $\begin{aligned} E X_{1}^{2} = \int_{0}^{1} x_{1}^{2} (x_{1} + \frac{1}{2}) d x_{1} & = \frac{5}{12} \\ var (X_{1}) = E X_{1}^{2} - {(E X_{2})}^{2} & = \frac{11}{144} \end{aligned}$

Then, the correlation is: $corr (X_{1}, X_{2}) = \frac{- \frac{1}{144}}{\sqrt{\frac{11}{144}} \sqrt{\frac{11}{144}}} = - \frac{1}{11}$

In addition to being scale-invariant, correlation is easier to interpret since it must be a number between 0 and 1.

Lemma. Let $X_{1}, X_{2}$ be random variables with finite second moments. Then $- 1 \leq corr (X_{1}, X_{2}) \leq 1$ .

Proof

Denote the correlation by $ρ = corr (X_{1}, X_{2})$ , the means by $μ_{1}, μ_{2}$ , and the variances by $σ_{1}^{2}, σ_{2}^{2}$ . Note that $cov (X_{1}, X_{2}) = σ_{1} σ_{2} ρ$ .

Then consider the expression ${[(X_{1} - μ_{1}) + t (X_{2} - μ_{2})]}^{2}$ as a polynomial in $t$ . Since the polynomial is nonnegative everywhere, by expanding the square one obtains: $0 \leq E {{[(X_{1} - μ_{1}) + t (X_{2} - μ_{2})]}^{2}} = (σ_{1}^{2}) t^{2} + (2 σ_{1} σ_{2} ρ) t + σ_{1}^{2}$ Thus, the polynomial can have at most one real-valued root (at zero), so the discriminant is negative. Therefore: $(2 σ_{1} σ_{2} ρ)^{2} - 4 σ_{1}^{2} σ_{2}^{2} \leq 0 ⟺ ρ^{2} \leq 1$

This result establishes that the largest absolute values of a correlation are $- 1$ and $1$ ; the smallest is $0$ . Thus, (absolute) values nearer to 1 indicate stronger dependence, and (absolute) values nearer to zero indicate weaker dependence.

Exercise: contingency table

Consider the random vector defined by the joint distribution given in the table below:

	$X_{1} = 0$	$X_{2} = 1$
$X_{2} = 0$	0.1	0.5
$X_{2} = 1$	0.3	0.1

First, consider whether you expect outcomes to be dependent, and if so, whether you expect a positive or negative covariance/correlation. Then compute the covariance and correlation.

Lastly, it is important to note that covariance and correlation do not capture every type of dependence, but rather only linear or approximately linear dependence. We will return to this later, but the classical counterexample is given below.

Perfectly dependent but uncorrelated

Let $U \sim uniform (- 1, 1)$ , and define $X = U^{2}$ . Then $E U = 0$ , so: $$ (U, X) = (UX) = U^3 = _{-1}^1 u^3 du = 0

$$ However, obviously $X, U$ are dependent because $X$ is a deterministic function of $U$ .