Random variables

Course notes

STAT425, Fall 2023

Archived

December 15, 2023

Informally, random variables are real-valued functions on probability spaces. Such functions induce probability measures on R, which encompass all of the familiar probability distributions. These distributions are, essentially, probability measures on the set of real numbers; to formalize this, we need to be able to articulate probability spaces with R as the sample space.

The Borel sets comprise the smallest σ-algebra containing all intervals contained in R. We will denote this collection by B; while it is possible to generate B constructively from the open intervals as a closure under complements, countable unions, and countable intersections, doing so rigorously is beyond the scope of this class. Importantly, however, B contains all intervals and all sets that can be formed from intervals; these will comprise our “events” of interest going forward.

Concepts

Let (S,S,P) be a probability space. A random variable is a function X:SR such that for every BB, X1(B)S.

Example: coin toss

If S={H,T}, and S=2S={,{H},{T},{H,T}}, define: X(s)={1,s=H0,sH Then X is a random variable since for any BB: X1(B)={{H},1B{T},0B So X1(B)S.

Not all functions are random variables. For example, take S={1,2,3} and the trivial σ-algebra S={,S}, and consider X(s)=s; then X1(2)=2S.

The condition that preimages of Borel sets be events ensures that we can associate probabilities to statements such as a<X(s)<b or, more generally XB. Specifically, we can assign such statements probabilities according to the outcomes in the underlying probability space that map to B under X. Thus, random variables induce probability measures on R.

More precisely, if (S,S,P) is a probability space and X:SR is a random variable, then the induced probability measure (on R) is defined for any Borel set BB as: PX(B)=P(X1(B))

The induced measure PX is known more commonly as a probability distribution or simply as a distribution: it describes how probabilities are distributed across the set of real numbers.

Example

You’ve already seen some simple random variables. For example, on the probability space representing two dice rolls with all outcomes equally likely, that is, S={1,,6}2 with S=2S and P(E)=|E|36, the function X((i,j))=i+j is a random variable, because X1(B)={(i,j):i+j=x for some xB}2S. Moreover, the probability distribution associated with X is:

PX(B)=136xB|{(i,j)S:i+j=x}| As determined in a previous homework problem, |{(i,j)S:i+j=x}|>0 only for x=2,3,,12, and the associated probabilities are:

x PX({x})
2 136
3 236
4 336
5 436
6 536
7 636
8 536
9 436
10 336
11 236
12 136

Cumulative distribution functions

We must remember that the concept of a distribution arises relative to some underlying probability space. However, it is not necessary to work with the underlying measure — distributions, luckily, can be characterized using any of several real-valued functions. Arguably, the most fundamental of these is the cumulative distribution function (CDF).

Given any random variable X:SR, define the cumulative distribution function (CDF) of X to be the function: FX(x)=PX((,x])=P({sS:X(s)x})

Check your understanding

Consider the dice roll example above with the random variable X((i,j))=i+j. Fill in the table below, and then draw the CDF.

x PX({x}) P(Xx)
2 136
3 236
4 336
5 436
6 536
7 636
8 536
9 436
10 336
11 236
12 136

If the random variable is evident from context, the subscript can be omitted and one can write F instead of FX.

Theorem. F is the CDF of a random variable if and only if it satisfies the following four properties:

  1. limxF(x)=0
  2. limxF(x)=1
  3. F is monotone nondecreasing: xyF(x)F(y)
  4. F is right-continuous: limxx0F(x)=F(x0)
Proof

We will prove the ‘necessity’ part: that if a random variable has CDF X, then it satisfies (i)-(iv) above. For sufficiency, one must construct a probability space and random variable X such that X has CDF F; we will skip this argument, as it is beyond the scope of this class.

For (i), observe that En={sS:X(s)n} is a nonincreasing sequence of sets for nN and limnEn=nEn=. Then: limxF(x)=limnF(n)=limnP(En)=P(limnEn)=0

For (ii), observe that En={sS:X(s)n} is a nondecreasing sequence of sets for nN and limnEn=nEn=S. Then: limxF(x)=limnF(n)=limnP(En)=P(limnEn)=1 For (iii), note that if xy then {sS:X(s)x}{sS:X(S)y}, so by monotonicity of probability F(x)F(y).

For (iv), let {xn} be any decreasing sequence with xnx0. For instance, xn=x0+1n. Then the sequence of events En={sS:X(s)xn} is nonincreasing, and limnEn=nEn={xS:X(s)x0}, so: limxx0F(x)=limnF(xn)=limnP(En)=P(limnEn)=P({xS:X(s)x0})=F(x0)

The portion of this theorem we didn’t prove is perhaps the more consequential part of the result, as it establishes that if F is any function satisfying properties (i)–(iv) then there exists a probability space and random variable X such that FX=F. This means that we can omit reference to the underlying probability space (S,S,P) since some such space exists for any CDF. Thus, we will write probabilities simply as, e.g., P(Xx) in place of PX((,x]) or P({sS:X(s)x}). Consistent with this change in notation, we will speak directly about probability distributions as distributions “of” (rather than “induced by”) random variables.

It’s important to remember that distributions and random variables are distinct concepts: distributions are probability measures (PX above) and random variables are real-valued functions. Many random variables might have the same distribution, yet be distinct. Since CDFs are one class of functions that uniquely identify distributions, if two random variables X,Y have the same CDF (that is if FX=FY) then they have the same distribution and we write X=dY

Two convenient properties of CDFs are:

  • If X is a random variable with CDF F, then P(a<Xb)=F(b)F(a)
  • If X is a random variable with CDF F, then P(X>a)=1F(a)

The proofs will be left as exercises. This section closes with a technical result that characterizes the probabilities of individual points xR.

Lemma. Let X be a random variable with CDF F. Then P(X=x)=F(x)F(x), where F(x) denotes the left-hand limit limzxF(z).

Proof

Define En={x1n<Xx}; then P(En)=F(x)F(x1n) and {En} is a nonincreasing sequence with limnEn={X=x}. Then: P(X=x)=P(limnEn)=limnP(En)=limn[F(x)F(x1n)]=F(x)limnF(x1n)=F(x)limzxF(x)=F(x)F(x)

Notice the implication that if F is continuous everywhere, then P(X=x)=0 for every xR.

Discrete random variables

If a CDF takes on countably many values, then the corresponding random variable is said to be discrete. For discrete random variables, P(x=x) is called the probability mass function (PMF) and the probability of any event EB is given by the summation: PX(E)=P(XE)=xEP(X=x)

It can be shown that discrete random variables take countably many values — i.e., P(X=x)>0 for countably many xR. We call the set of points where the probability mass function is positive — {xR:P(X=x)>0} — the support set (or simply the support) of X (or its distribution).

Theorem. A function f is the PMF of a discrete random variable if and only if:

  1. 0f(x)1
  2. xRf(x)=1
Proof

If f is a PMF of a discrete random variable X, then by definition the PMF is f(x)=F(x)F(x). Since F(x)F(x) by monotonicity of F, f(x)0 for every x. Since limxF(x)=1 and F is monotonic, F(x)1 for every x; by construction f(x)F(x)1. So 0f(x)1.

For the converse implication, if f is a function satisfying (i)–(ii), then f(x)>0 for only countably many values. To see this, consider An={xR:1n+1f(x)<1n} for n=1,2, with A0={xR:f(x)=1}; by construction we must have xRf(x)xAnf(x) for every n since f is nonnegative per (i) and AnR. Now f(x)1n+1 on An so xAnf(x)xAn1n+1; but then if An is infinite the sum on the right diverges, and thus so does the sum over all xR contrary to (ii). So (ii) entails that |An|< for every n, and thus the union n=1An={xR:f(x)>0} is a countable set.

Let S denote the set {xR:f(x)>0}. Let x1,x2, denote the elements of S and p1,p2, denote the corresponding values of f, that is, pi=f(xi). By hypothesis we have that 0pi1 (condition (i)) and i=1pi=1 (condition (ii)). Let P(E)=i:xiEpi for E2S. Then P is a probability measure on (S,2S) — it suffices to check that P satisfies the probability axioms.

  • Axiom 1: P(E)=i:xiEpi0 since by hypothesis pi0.
  • Axiom 2: P(S)=i:xiSpi=i=1pi=1.
  • Axiom 3: let {Ej} be disjoint and define Ij={i:xiEj}. Note that {i:xijEj}=jIj and IjIk= for jk. Then: P(jEj)=jIjpi=jIjpi=jP(Ej)

So (S,2S,P) is a probability space. Now let X be the identity map X(s)=s. X is a random variable, since X1(B)={xj}jJ2S for every BB, and its CDF is given by F(x)={i:xix}pi. This is a step function with countably many values, so X is a discrete random variable. Finally, it is easy to check that:

P(X=x)=F(x)F(x)={0xSpix=xi

So X has PMF P(X=xi)=pi=f(xi) as required.

This result shows that PMFs uniquely determine discrete distributions. It also establishes that the support set is countable, so the unique values can be enumerated as {x1,x2,,xi,}. We can recover the CDF from the PMF as F(x)=xixP(X=xi).

Continuous random variables

If a CDF is absolutely continuous everywhere then the corresponding random variable is said to be continuous. In this case, P(X=x)=0 for every xR and so we define instead the probability density function (PDF) to be the function f such that F(x)=xf(z)dz

By the fundamental theorem of calculus, one has that f(x)=ddxF(x). The probability of any event EB is given by the integral: PX(E)=P(XE)=Ef(x)dx

For continuous random variables, the support set is defined as the set of points with positive density, that is, {xR:f(x)>0}.

Similar to the theorem above for discrete distributions, it can be shown that a function f is the PDF of a continuous random variable if and only if:

  1. f(x)0
  2. Rf(x)dx=1

The proof of this result is omitted.

Exercise: characterizing distributions

For each of the functions below, determine whether it is a CDF. Then, for the CDFs, identify whether a random variable with this distribution is discrete or continuous, and find or guesstimate the PMF/PDF if it exists. You can ignore the behavior at the endpoints, but to check your understanding, identify for each jump which value the function must take at the endpoint for it to be a valid CDF.

These results establish that all distributions (again, recall that technically distributions are probability measures on R induced by random variables) can be characterized by CDFs or PDF/PMFs. If a random variable X has the distribution given by the CDF F or the PDF/PMF f, we write

XF(x)orXf(x)

respectively.