Conditional distributions and independence

Course notes

STAT425, Fall 2023

Archived

December 15, 2023

The concept of conditional probability extends naturally to distributions of random variables. This extension is useful for describing probability distributions that depend on another variable. For example: data on health outcomes by age provides information about the conditional distribution of outcomes as it depends on age; the distribution of daytime temperatures depends on time of year and location; the distributions of measures of educational attainment depend on socioeconomic indicators. In all of these cases, we are considering the conditional distribution of a variable of interest (health outcomes, daytime temperatures, educational attainment) given other variables (age, time/location, socioeconomic variables); this may be of use in modeling the dependence, correcting by conditioning variables to obtain joint distributions, or a variety of other purposes. Here we’ll focus on the relationship between joint probability distributions and conditional distributions, and illustrate some applications of conditional distributions.

If \((X_1, X_2)\) is a random variable with some joint distribution, consider the events \(\{X_1 \in A\}\) and \(\{X_2 \in B\}\) and assume \(P(X_2 \in B) > 0\). The conditional probability of the former event given the latter is: \[ P\left(X_1 \in A | X_2 \in B\right) = \frac{P(X_1 \in A, X_2 \in B)}{P(X_2 \in B)} \] The probability in the numerator can be computed from the joint distribution, and the probability in the denominator can be computed from the marginal distribution of \(X_2\). As a function of the set \(A\), this can be viewed as a probability measure, and suggests that the distribution function (PMF/PDF) that characterizes it can be obtained from the joint and marginal PMF/PDF.

If \(X_1, X_2\) are discrete and \(x_2\) is any value such that \(P(X_2 = x_2) > 0\), then the conditional PMF of \(X_1\) given \(X_2 = x_2\) is: \[ P(X_1 = x_1 | X_2 = x_2) = \frac{P(X_1 = x_1, X_2 = x_2)}{P(X_2 = x_2)} \] It is easy to verify that this is a PMF since it takes values between 0 and 1 and it is easy to check that summing over the marginal support of \(X_1\) gives 1: \[ \sum_{x_1} P(X_1 = x_1 | X_2 = x_2) = \frac{\sum_{x_1}P(X_1 = x_1, X_2 = x_2)}{P(X_2 = x_2)} = \frac{P(X_2 = x_2)}{P(X_2 = x_2)} = 1 \] The functional form in any particular instance will be the same for every value of \(x_2\), so this is often referred to as the conditional distribution of \(X_1\) given \(X_2\), without reference to the specific value on which one is conditioning.

In the continuous case, the contional PDF is constructed in analogous fashion: \[ f_{1|2}(x_1) = \frac{f(x_1, x_2)}{f_2 (x_2)} \] This is a well-defined PDF for every \(x_2\) in the marginal support of \(X_2\), since it is clearly nonnegative and: \[ \int_{-\infty}^\infty f_{1|2}(x_1)dx_1 = \frac{\int_{-\infty}^\infty f(x_1, x_2)dx_1}{f_2(x_2)} = \frac{f_2(x_2)}{f_2(x_2)} = 1 \]

Example: finding a conditional PDF

Let \(X_1, X_2\) be uniform on the triangle \(0 < x_1 < x_2 < 1\), so that: \[ f(x_1, x_2) = 2 \;,\qquad 0 < x_1 < x_2 < 1 \] The marginal distributions are then given by the PDFs: \[ \begin{align*} f_1(x_1) &= \int_{x_1}^1 2 dx_2 = 2(1 - x_1) \;,\quad 0 < x_1 < 1 \\ f_2(x_2) &= \int_0^{x_2} 2 dx_1 = 2x_2 \;,\quad 0 < x_2 < 1 \end{align*} \] So the conditional distribution of \(X_1\) given \(X_2\) is: \[ f_{1|2}(x_1) = \frac{f(x_1, x_2)}{f_2(x_2)} = \frac{2}{2x_2} = \frac{1}{x_2} \;,\quad 0 < x_1 < x_2 \] Notice here that the support set must be determined based on the joint distribution. This is a uniform distribution on the interval \((0, x_2)\). One may write: \[ (X_1 | X_2 = x_2) \sim \text{uniform}(0, x_2) \]

Check your understanding Show that \((X_2|X_1 = x_1) \sim \text{uniform}(x_1, 1)\).

Conditional expectation

The conditional expectation of \(g(X_1)\) given \(X_2\) is defined as: \[ \mathbb{E}[g(X_1) | X_2 = x_2] = \begin{cases} \sum_{x_1} g(x_1) P(X_1 = x_1 | X_2 = x_2) \quad\text{(discrete case)}\\ \int_{-\infty}^\infty g(x_1) f_{1 | 2}(x_1)dx_1 \quad\text{(continuous case)} \end{cases} \]

That is, conditional expectation is simply an expected value computed in the usual way but using the conditional mass or density function in place of the marginal. Similarly, the conditional variance is defined as:

\[ \text{var}[X_1 | X_2 = x_2] = \begin{cases} \sum_{x_1} (x_1 - \mathbb{E}(X_1|X_2 = x_2))^2 P(X_1 = x_1 | X_2 = x_2) \quad\text{(discrete case)}\\ \int_{-\infty}^\infty (x_1 - \mathbb{E}(X_1|X_2 = x_2))^2 f_{1 | 2}(x_1)dx_1 \quad\text{(continuous case)} \end{cases} \]

Example: computing a conditional mean

In the previous example, the conditional expectations are easy to find based on the properties of the uniform distribution: \[ \begin{align*} \mathbb{E}[X_1 | X_2 = x_2] &= \frac{x_2}{2} \\ \mathbb{E}[X_2 | X_1 = x_1] &= \frac{x_1 + 1}{2} \end{align*} \] By comparison, the marginal expectations can be found to be: \[ \begin{align*} \mathbb{E}X_1 &= \int_0^1 x_1\cdot 2(1 - x_1) dx_1 = \frac{1}{3} \\ \mathbb{E}X_2 &= \int_0^1 x_2 \cdot 2x_2 dx_2 = \frac{2}{3} \end{align*} \]

Consider the example immediately above, and notice that the conditional means are functions of the value of the other “conditioning” variable. That will be generally true, that is: \[ \mathbb{E}[X_1|X_2 = x_2] = h(x_2) \] Thus, the conditional expectation can be considered as a function of the random variable \(X_2\) and therefore as itself a random variable, that is, \[ \mathbb{E}(X_1|X_2) = h(X_2) \] So conditional expectations are themselves random variables with their own distributions, means, variances, and the like. The same is true of conditional variances. This leads, among other things, to two classic results regarding iterated expectations.

Theorem (Total expectation). For any random variables \(X, Y\): \[\mathbb{E}X = \mathbb{E}\left[\mathbb{E}(X|Y)\right]\]

We will review the proof in class. Check that the result holds in the example above.

Theorem (Total variance). For any random variables \(X, Y\): \[\text{var}X = \text{var}\left[\mathbb{E}(X|Y)\right] + \mathbb{E}\left[\text{var}(X|Y)\right]\]

We will review the proof in class. Check that the result holds in the example above.

Independence

Intuitively, random variables are independent if the value of one does not affect the distribution of another. In other words, \(X, Y\) are independent if for every \(B\): \[ P(X \in A | Y \in B) = P(X \in A) \] Just as with independent events, however, we do not define independent random variables according to whether conditional and marginal probabilities match, but rather according to whether joint probabilities factor. That is, \(X\) and \(Y\) are independent just in case for every \(A, B\): \[ P(X \in A, Y \in B) = P(X \in A)P(Y \in B) \] In terms of distribution functions, this is equivalent to the following condition: \[ \begin{cases} f(x, y) = f(x)f(y) \quad&\text{(continuous case)} \\ P(X = x, Y = y) = P(X = x)P(Y = y) \quad&\text{(discrete case)} \end{cases} \]

We write \(X \perp Y\) to indicate that the random variables are independent. Since the condition above involves knowing the marginal PDF/PMFs, the following theorem provides a useful heuristic for checking independence.

Theorem (factorization theorem). \(X \perp Y\) if and only if there exist functions \(g, h\) such that, if \(f\) is the joint PMF/PDF: \[ f(x, y) = g(x)h(y) \]

Proof
Example

Let \(X_1, X_2\) be distributed according to the joint PDF: \[ f(x_1, x_2) = \frac{1}{2\pi}\exp\left\{-\frac{1}{2}\left[(x_1 - \mu_1)^2 + (x_2 - \mu_2)^2\right]\right\} \;,\quad (x_1, x_2) \in \mathbb{R}^2 \] The factorization theorem entails almost immediately that \(X_1 \perp X_2\), without knowing the marginal distributions, since the joint density can be written: \[ f(x_1, x_2) = \underbrace{\left[\frac{1}{2\pi}\exp\left\{-\frac{1}{2}(x_1 - \mu_1)^2\right\}\right]}_{g(x_1)} \underbrace{\left[\exp\left\{-\frac{1}{2} (x_2 - \mu_2)^2\right\}\right]}_{h(x_2)} \;,\quad x_1 \in \mathbb{R}, x_2 \in \mathbb{R} \] Notice that the support set also has to be expressible as a Cartesian product.

Lemma. If \(X\perp Y\) then the support of \((X, Y)\) is a Cartesian product.

Proof

The contrapositive tells us that if the joint support set of any random variables is not a Cartesian product, then they cannot be independent. Moreover, the lemma makes it rather straightforward to establish the following result.

Corrollary. If \(X \perp Y\) and \(g, h\) are functions whose expectations exist then: \[ \mathbb{E}\left[g(X)h(Y)\right] = \mathbb{E}[g(X)]\mathbb{E}[h(Y)] \]

Proof

Conditional probability models

Notice that the definitions of conditional PDF/PMFs entail that the joint PDF/PMFs can be obtained from either conditional and the remaining marginal: \[ \begin{align*} P(X_1 = x_1, X_2 = x_2) &= P(X_1 = x_1 | X_2 = x_2) P(X_2 = x_2) \quad\text{(discrete case)} \\ f(x_1, x_2) &= f_{1 | 2}(x_1) f_2 (x_2) \quad\text{(continuous case)} \end{align*} \]

This allows one to construct models for multivariate processes from conditional distributions in a hierarchical fashion. For example: \[ \begin{align*} (X_1 | X_2 = x_2) \sim f(x_1) \\ X_2 \sim g(x_2) \end{align*} \] We’ll explore this idea a little in class.