Intro

If @@@X@@@ is a RV and @@@g:\R \to \R@@@ is a nice function then we can define a new RV @@@Y@@@ by letting @@@Y = g(X)@@@. Then this @@@Y@@@ is a ‘‘transformation’’ of @@@X@@@, a synonym for a ``function of @@@X@@@’’. That’s it. Let’s look at an example.

The radius of a round tumor grows at a rate of @@@1\%@@@ per day until detected. On day @@@0@@@, the radius of the tumor is @@@1/10@@@ of an inch. Suppose that the number of days until the tumor is detected is random variable @@@T@@@ with some known distribution.

The radius of the tumor on the day it is detected is @@@0.1 * (1.01)^T@@@, a function of the RV @@@T@@@, and the area covered by the tumor that day is @@@\pi *(0.1 * (1.01)^T)^2@@@. Both quantities are RVs which are functions of @@@T@@@. These are examples of transformations of the RV @@@T@@@, nothing but a fancy name for a function of a RV.

In this section we discuss how to compute distribution and expectation of some easier classes of transformations of RVs. I’d like to emphasize the following:

  1. Transformations of discrete RVs are always discrete RVs, while any distribution (discrete/continuous/mixed) can be obtained as a transformation of a (any) continuous RV. See Theorem 4.
  2. Finding the distribution of transformations of RVs can be a bit tricky (not always!).
  3. Formulas for calculating expectations of transformations is fairly straightforward, and do not require finding the distribution of the transformation.

This video illustrates the idea of a transformation of some random variable. Can you identify which?

https://www.youtube.com/watch?v=ZFNstNKgEDI

In what follows we will limit the discussion to transformations of discrete and RVs with densities, and discuss each of the cases in a separate section.

Transformations of Discrete RVs

Let’s begin with the simplest case when @@@X@@@ is a discrete RV. Let’s suppose that @@@g:\R \to \R@@@ is some function. Then @@@Y=g(X)@@@ is a discrete RV, because @@@X@@@ can only take countably many values, hence @@@Y@@@. We know then that

$$E [Y] = \sum_{y} y P(Y=y) = \sum_y y P(g(X) =y).$$

For each @@@y@@@, @@@P(g(X)=y) = \sum_{x:g(x)=y} P(X=x)@@@, which gives us the PMF of @@@Y@@@. Plugging this into the definition of expectation, we obtain

$$ E[Y] = \sum_{y} \sum_{x:g(x)=y} y P(X=x)= \sum_y \sum_{x:g(x)=y} g(x) P(X=x).$$

The iterated summation guarantee summation over all @@@x@@@ in the support of @@@X@@@, each exactly once. We record the result.

Proposition 1.

Let @@@X@@@ be a discrete RV and let @@@g:\R\to \R@@@ be a function. Let @@@Y=g(X)@@@. Then

  1. The PMF of @@@Y@@@ is given by
$$p_Y(y) = \sum_{x:g(x)=y} P(X=x).$$
  1. The expectation of the RV @@@g(X)@@@ is
$$E[g(X)] = \sum_x g(x) p_X(x),$$

provided @@@\sum_x | g(x) | p_X(x)@@@ is finite.

What’s important? We don’t really need to calculate the PMF of the transformed RV.

Example 1.

Let @@@X\sim\mbox{Bin}(n,p)@@@. We will show that

$$E[e^{t X} ] = (pe^t +(1-p))^n.$$

In our case @@@g(x) = e^{t x}@@@, so we need to figure out

$$\begin{align} \label{eq:mom_gen} E [ e^{t X} ] & = \sum_{x=0}^n e^{t x} P(X=x)\\ \nonumber & = \sum_{x=0}^n e^{t x} \binom{n}{x} p^x (1-p)^{n-x} \\ \nonumber & = \sum_{x=0}^n \binom{n}{x} (pe^t)^x (1-p)^{n-x}\\ \nonumber & = (pe^t +(1-p))^n, \end{align}$$

where in the last equality, the binomial formula was used.

Example 2.

A computer virus spreads at a rate of @@@25\%@@@ per day, and the number of days until it is stopped is Geometric with expectation of 5 days. What is the expected number of computers affected until it is stopped, if on day zero @@@4000@@@ computers were affected?

Let @@@T@@@ denote the number of days until the virus is stopped. Then @@@T\sim \mbox{Geom}(1/5)@@@. The number of computers affected is by the time the virus is stopped is @@@X=4000* 1.25^T@@@, a transformation of @@@T@@@. Note that @@@X@@@ is finite, because @@@T@@@ is finite. To compute the expectation of @@@X@@@ we write

$$ E[X] = \sum_{t\in\N} 4000*1.25^t P(T=t) = \sum_{t\in \N} 4000 *(\frac{5}{4})^t (\frac{4}{5})^{t-1} \frac 15 =\infty.$$

Therefore the expected number of computers affected is infinite, although the actual number is always finite.

Exercise 1.

Let @@@X@@@ be uniform RV on @@@\{1,2,3,4,5,6\}@@@ and let @@@ |X-3.5 | @@@ </div>. What is the expectation of @@@Y@@@?

Exercise 2.

What is the expectation of @@@X@@@ in Example 2 if the number of days until the virus is stopped is Geometric with expectation of 4 days?

Example 3.

Suppose that @@@X@@@ is a discrete, integer valued RV. What is the probability that @@@X@@@ is even?

We are going to solve this with transformations, using the following simple fact: for an integer @@@k@@@, @@@(-1)^k@@@ is @@@1@@@ if @@@k@@@ is even and @@@-1@@@ if @@@k@@@ is odd. Now

$$E[ (-1)^X ]=E[ {\bf 1}_{\{X\mbox{ even}\}} - {\bf 1}_{\{X \mbox{ odd}\}}] = 2 P(X\mbox{ even})-1.$$

In other words,

$$ P( X\mbox{ even}) = \frac 12 +\frac12 E[(-1)^X].$$

Let’s compute this for two particular cases.

  1. @@@X\sim \mbox{Bin}(n,p)@@@. Then
$$E[(-1)^x]=\sum_{k=0}^n \binom{n}{k} (-1)^k p^k (1-p)^{n-k} = (1-2p)^n,$$

where the last equality is due to the binomial theorem. Therefore,

$$ P(X \mbox{ even}) = 1/2+ 1/2(1-2p)^{n}.$$
  1. @@@X \sim \mbox{Pois}(\lambda)@@@. In this case
$$E[(-1)^X]=e^{-\lambda} \sum_{k=0}^\infty (-1)^k \lambda^k /k! =e^{-2\lambda}.$$

Therefore

$$ P(X\mbox{ even})= \frac 12 + \frac 12 e^{-2\lambda}.$$

Transformations of Continuous RVs

It may be useful to review the chain rule and the substitution formula for integrals from calculus.

It’s worth beginning with a discussion on a common error. In an exam I gave several times I asked something like this:

  • '’Suppose @@@X\sim U[0,1]@@@ and let @@@Y=X^2@@@. Find the density of @@@Y@@@’’.

The most common answer I received was something like this:

  • ’’ The density of @@@X@@@, @@@f_X@@@ is equal to @@@1@@@ on @@@[0,1]@@@ and to zero elsewhere. The density of @@@Y=X^2@@@ is therefore equal to @@@(f_X(x))^2@@@, that is equal to @@@1@@@ on @@@[0,1]@@@ and zero elsewhere.’’

This answer is completely wrong.

  • What is the probability that @@@X\le\frac 14@@@? @@@\frac 14@@@, the integral @@@\int_{0}^{\frac 14} f_X (x) dx@@@.
  • What is the probability that @@@Y\le\frac 14@@@? Well, this is the event @@@\{X^2 \le \frac 14\}=\{X\le\frac 12\}= \frac 12@@@. This is of course @@@\int_0^{\frac 14} f_Y(y) dy@@@, so clearly @@@f_Y@@@ cannot be equal to the constant @@@1@@@ on @@@[0,1]@@@!

This simple analysis can be used to derive the CDF of @@@Y@@@ and then its density, if exists, through a straightforward approach that often is all you need. The recipe is the following:

  1. For a number @@@y@@@, identify the event @@@\{Y \le y\}@@@ in terms of @@@X@@@.
    • In our particular case, since @@@Y=X^2@@@, we have @@@\{Y\le y\} = \{X^2 \le y\}@@@.
  2. Manipulate the resulting event and express it in terms of @@@X@@@ so you can identify its probability through the CDF of @@@X@@@. This step involves undoing the transformation, and you need to be extra careful and attentive to details.
    • For any @@@y<0@@@ the event @@@\{X^2 \le y\}@@@ is empty, and for @@@y \ge 0@@@, we can take square roots on both sides to identify the event as @@@\{ |X| \le \sqrt{y}\}@@@. Of course, since in our case @@@|X|=X@@@, this event is simply @@@\{X\le \sqrt{y}\}@@@. It probability is, by definition, @@@F_X (\sqrt{y})@@@, and this is equal to @@@\sqrt{y}@@@ when @@@y@@@ is in @@@[0,1]@@@ and zero otherwise.
  3. Determine what type of distribution you get. Unlike the discrete case, where any transformation of a discrete RV is discrete, in the continuous setting you can get anything.
    • In our case, @@@F_Y@@@ is continuous, and therefore @@@Y@@@ is continuous. Its derivative is @@@\frac{1}{2\sqrt{y}}@@@ in the interval @@@(0,1)@@@, which integrates to @@@1@@@. Therefore @@@Y@@@ has density equal to @@@\frac{1}{2\sqrt{y}}@@@ on @@@(0,1)@@@ and zero elsewhere. Definitely not a constant density! Nothing like the density of @@@X@@@.

As in the discrete case, computing expectations for transformations for RVs with densities does not require finding the distribution of the transformed RV. The formula is basically the same as in the discrete case.

Proposition 2.

Suppose that @@@X@@@ has density @@@f_X@@@. If @@@g@@@ is a piecewise continuous function, then

$$E[g(X)] = \int_{-\infty}^\infty g(s) f_X (s) ds,$$

provided

$$\int | g(s) | f_X (s) ds < \infty.$$
Example 4.

Suppose that @@@X@@@ is exponential with parameter @@@\lambda@@@. We show that @@@E[X^2]=\frac{2}{\lambda^2}@@@. To solve this, write

$$\begin{align*} E[X^2] & = \int_0^\infty t^2 f_X (t) dt \\ & =\lambda \int_0^\infty t^2 e^{-\lambda t} dt \\ & = \lambda \int_0^\infty {\frac{\partial^2 }{\partial \lambda^2} } e^{-\lambda t} dt \\ & = \lambda {\frac{\partial^2 }{\partial \lambda^2} } \int_0^\infty e^{-\lambda t} dt \\ & = \lambda {\frac{\partial^2 }{\partial \lambda^2} } \lambda^{-1} \\ & = \lambda 2 \lambda^{-3} = \frac{2}{\lambda^2}. \end{align*}$$

Where: *We get the third line by observing that @@@t^2 e^{-\lambda t}@@@ is the second derivative of @@@e^{-t \lambda}@@@, as a function of @@@\lambda@@@.

Of course, you can also integrate by parts, as we did in the calculation we cited above.

Exercise 3.

Let @@@X\sim\mbox{Exp}(\lambda)@@@ for some @@@\lambda>0@@@. What is the probability that @@@X>E[X]@@@?

Let’s try something harder.

Example 5.

A point is sampled at random from the unit circle @@@x^2+y^2=1@@@. A line segment is drawn from the point to @@@(1,0)@@@. What is the expected length of the line segment @@@L@@@?

We will interpret the question as follows: the point on the line make an angle @@@X@@@ with the positive @@@x@@@-axis which is uniformly distributed on @@@[0,2\pi]@@@ and has therefore density @@@\frac{1}{2\pi}@@@ on @@@[0,2\pi]@@@. The @@@xy@@@-coordinates of the point drawn is then @@@(\cos X,\sin X)@@@. Therefore

$$ L=\sqrt{ (1-\cos X)^2+ \sin^2 X } = \sqrt{ 2(1-\cos X)}.$$

Use the formula, @@@\cos (2\theta ) = 1-2\cos^2 \theta@@@, therefore @@@L=2|\cos(X/2)|@@@. Okay. Let’s compute the expectation.

$$ E[ L] = \frac{1}{2\pi} \int_0^{2\pi} 2 |\cos (t/2)| dt.$$

By symmetry between the case @@@t \in [0,\pi]@@@ and @@@t \in [\pi,2\pi]@@@, we see that

$$E[L] =2 \frac{1}{2\pi} \int_0^{\pi} 2 \cos (t/2) dt = \frac{1}{\pi} \int_0^{\pi/2} \cos(u)du=\frac{1}{\pi}.$$

A transformation of a continuous RV may not be continuous. If the function @@@g@@@ takes only finitely many values or countably many values, @@@g(X)@@@ is necessarily discrete, and this explains the absence of density in the statement of Proposition Proposition 2. Suppose that @@@X@@@ is continuous with density @@@f_X@@@ and CDF @@@F_X@@@, and @@@g@@@ is a strictly increasing function with continuous strictly p derivative. Let @@@Y=g(X)@@@. Note that the conditions on @@@g@@@ imply it has an inverse @@@g^{-1}@@@. We find the CDF of @@@Y@@@, @@@F_Y@@@ is given by

$$\begin{align*} F_Y(y)& = P(Y\le y) = P(g(X) \le y) \\ & = P(X \le g^{-1} (y))\\ & = \int_{-\infty}^{g^{-1}(y)} f_X (t) dt\\ & \underset{s=g(t)} {=} \int_{-\infty}^y f_X(g^{-1} (s))(g^{-1})'(s) ds. \end{align*}$$

From the definition of density we then conclude the following result:

Proposition 3.

Suppose that @@@X@@@ is continuous with density @@@f_X@@@ and @@@g@@@ is continuous, strictly increasing or strictly decreasing with continuous, nonzero derivative. Then the RV @@@Y=g(X)@@@ is continuous with density given by \begin{empheq}\label{eq:transformed_cts} g_Y(y) =f_X(g^{-1}(y)) | {g^{-1}}’ | (y), \end{empheq}

Let’s put this into action.

Example 6.

Let @@@X\sim \mbox{U}[0,\pi/2]@@@. Find the density of @@@Y=\tan (X)@@@.

Here @@@f_X= \frac{2}{\pi}@@@ on @@@(0,\pi/2)@@@, @@@g(t) =\tan (t)@@@ and so @@@g^{-1}(s)=\arctan(s)@@@ for @@@s\in (0,\infty)@@@. Therefore @@@Y@@@ has density

$$f_Y(y) = \frac{2}{\pi}\frac{d}{dy} \arctan y= \frac{2}{\pi} \frac{1}{1+y^2}.$$
Example 7.

Let @@@X\sim \mbox{U}[0,1]@@@. Let @@@g(t) = -\frac{1}{\lambda} \ln (1-t )@@@ defined on @@@(0,1)@@@ has range @@@(0,\infty)@@@ and inverse @@@g^{-1}(s) = 1- e^{-\lambda s}@@@, on @@@(0,\infty)@@@, with range in @@@(0,1)@@@. Since the density of @@@X@@@ is @@@1@@@ on @@@[0,1]@@@, we have that @@@Y=g(X)@@@, has density @@@(g^{-1})'(y)=\lambda e^{-\lambda y}@@@ on @@@y \in (0,\infty)@@@, that is @@@Y\sim \mbox{Exp}(\lambda)@@@.

Exercise 4.

Let @@@X\sim\mbox{Exp}(1)@@@. Find a function @@@g@@@ such that @@@g(X)\sim\mbox{U}[0,1]@@@.

In fact, this example is a special case of a more general result: every, but every distribution can be obtained from transforming a @@@\mbox{U}[0,1]@@@ RV. More precisely:

Theorem 4.

Let @@@F@@@ be a distribution function. For @@@u>0@@@, let

$$\begin{equation} \label{eq:inversefunction} G(u) = \inf \{x: F(x)\ge u\}. \end{equation}$$

Then @@@Y=G(U)@@@ is a random variable with distribution function @@@F@@@.

Before the proof, let’s do a simple case, the case where @@@F@@@ is continuous and strictly increasing. It then has an inverse @@@G@@@. That is @@@F(G(u))=u@@@. Suppose that @@@\mbox{U}[0,1]@@@, let @@@Y=G(U)@@@, then \begin{equation} \label{eq:inversion} P(Y \le y) = P(G(U)\le y)=P( U \le F(y)) = F(y). \end{equation}

Proof.

Inspection of \eqref{eq:inversion} shows that it is enough to show that @@@\{G(U)\le y\}=\{U\le F(y)\}@@@ for all @@@y@@@. By taking complements, we need to satisfy @@@\{G(U) > y\} = \{U>F(y)\}@@@. Let’s do this. The minimum in the definition of @@@G(u)@@@ is attained because @@@u>0@@@ and @@@F@@@ is right-continuous. Observe that @@@G(u)>y@@@ implies @@@F(y)<u@@@, and conversely if @@@F(y)<u@@@ we necessarily have @@@G(u)>y@@@. Therefore @@@\{G(U)>y\}= \{U>F(y)\}@@@, possibly on the event @@@U=0@@@. Since this event has probability zero, it follows that @@@P(G(U)\le y)=P( U \le F(y) ) = F(y)@@@, therefore @@@G(U)@@@ has the desired distribution.

So we can generate any distribution from uniform. What about the converse. Let’s try to undo the process. Take our favorite distribution with CDF @@@F@@@, generated by applying the an “inverse” of @@@F@@@ to a uniformly distributed RV on @@@[0,1]@@@. Intuitively, we try to “peel off” @@@G@@@ by applying @@@F@@@ to the resulting RV, resulting in a uniform. The problem is that this cannot always work. A Bernoulli RV takes only two values, and there is no way to apply a function to it to get a continuous RV or any RV taking more than two values… So one needs some additional condition.

Theorem 5.

Let @@@X@@@ be a continuous RV with CDF @@@F@@@. Then the random variable @@@F(X)@@@ is uniformly distributed on @@@[0,1]@@@.

Proof.

Let @@@G@@@ be as in \eqref{eq:inversefunction}. Specifically, for @@@u\in (0,1)@@@, @@@G(u)@@@ is the infimum of all @@@x@@@ such that @@@F(x) \ge u@@@. Since @@@F@@@ is a continuous CDF, the infimum is attained and @@@F(G(u))=u@@@. As a result, @@@P( F(X) \ge u) = P( X \ge G(u)) = 1- F(G(u)) = 1-u@@@, where we have used the continuity of the @@@X@@@ for the second equality.

General Case

To discuss a more general setting, let’s consider the following scenario. Let @@@X@@@ be a nonnegative RV and let @@@g@@@ be a continuous strictly increasing nonnegative function. Let @@@Y=g(X)@@@. How do we compute @@@E[Y]@@@? Recall that @@@g@@@ has an inverse function @@@g^{-1}@@@ defined on the range of @@@g@@@. Namely: if @@@t=g(s)@@@, then @@@g^{-1}(t)=s@@@. Now let’s use our formula for expectation:

$$\begin{align*} E[Y] & = \int_0^\infty P(Y>t) dt\\ & = \int_0^\infty P(g(X)>t) dt\\ & = \int_0^\infty P(X>g^{-1}(t)) dt \end{align*}$$

Next stage is substitution: @@@s=g^{-1}(t)@@@, and since @@@t=g(s)@@@ we have @@@dt = g'(s)ds@@@, resulting in the following formula \begin{equation} \label{eq:exp_transform} E[Y] = \int_{g^{-1}(0)}^{g^{-1}(\infty)} P(X>s) g’(s) ds. \end{equation} A (first or) second inspection of \eqref{eq:exp_transform}, suggests integration by parts. We can definitely do this if @@@P(X>t)@@@ is continuously differentiable, in which case @@@X@@@ has a density @@@f_X@@@. To make things work, let’s also assume for the time being that @@@g@@@ is bounded. This gives us

$$\begin{align*} E[Y] & = \int_0^\infty P(X>s) g'(s) ds \\ & = P(X>s)g(s)|_g^{-1}(0)^{g^{-1}(\infty)} - \int \frac{d}{ds} P(X>s) g(s) ds \\ & 0-0+ \int_0^\infty f_X(s) g(s) ds. \end{align*}$$

This is a derivation of the formula in Proposition Proposition 2 in a special case. The general case is done through discretization: approximating @@@X@@@ by discrete RVs.

Variance

Definition and properties

Expectation provides us a measure of central tendency or in less words, a notion of a center by a single number. It does not tell us anything about how ``random’’ @@@X@@@ is (for all we know the RV could be constant). The next level to try to describe deviation or dispersion from this center, which we will attempt, again, to summarize with a single number. To do that suppose that @@@X@@@ had finite expectation @@@E[X]@@@, and let @@@Y=X-E[X]@@@. Well, this is the actual deviation from the expectation, but it is still an RV. It has a distribution. It’s not a single number (unless what?). Also, by linearity of the expectation, @@@E[Y]=0@@@: when taking the expectation of @@@Y@@@, the deviations above @@@E[X]@@@ are canceled by the deviations below @@@E[X]@@@. In order to get something that is not trivial, we can consider @@@E[Y^2]@@@, the expectation of square of the deviations from the center. You may ask why square and not absolute value, or the fourth power or something more crazy. Well, for now, I hope you will be satisfied by the simplicity and the beauty of the resulting expression. A deeper reason will appear later when we discuss the central limit theorem.

Definition 1.

Suppose @@@X@@@ has finite expectation. The variance of @@@X@@@, @@@\sigma^2_X@@@, is defined as

$$ \sigma^2_X = E[(X-E[X])^2],$$

and @@@\sigma_X@@@, the (nonnegative positive) square root of @@@\sigma^2_X@@@ is called the standard deviation of @@@X@@@.

The importance of the standard deviation is that it provides the right order of the deviation from the mean. If @@@X@@@ is measured in some units, say meters, it is only natural to measure the deviation from the expectation also in meters, but the variance gives us a measurement in square meters. By taking the root, the standard deviation gives us a measurement of deviation of the same scale as @@@X@@@ and the values it measures.

Let’s expand the expression for @@@\sigma_X^2@@@.

$$\begin{align} \label{eq:variance_expanded} \sigma^2_X & = E[(X- E[X])^2]\\ \nonumber & = E [ X^2 - 2 X E[X] + E[X]^2]\\ \nonumber & = E[X^2] - 2 E[X]E[X] + E[X]^2 \\ \nonumber & = E[X^2] - E[X]^2 \end{align}$$

We have the following

Corollary 6.

Suppose the RV @@@X@@@ has a finite mean. Then

$$0\le \sigma_X^2 = E[X^2] - E[X]^2,$$

and equality holds if and only if @@@P(X=E[X])=1@@@.

  1. For any @@@a \in \R@@@, @@@\sigma^2_{X+a} = \sigma^2_X@@@.
  2. If @@@\sigma_X^2@@@ is finite, @@@\sigma_{cX}^2 = c^2\sigma_X^2@@@ for any @@@c\in \R@@@.
Proof.
  1. This follows by applying Corollary 1 to the LHS \eqref{eq:variance_expanded}.
  2. From the definition,
$$ \sigma_{X+a} ^2 = E[ ((X+a) -E [X+a]) ^2 ] = E[ (X - E[X])^2] = \sigma_X^2.$$
  1. This follows from the linearity of the expectation: @@@E[(cX)^2] = E[c^2 X^2] = c^2 E[X^2]@@@, and similarly @@@E[(cX)]^2 = (cE[X])^2@@@.

Let’s see an application to the corollary.

Example 8.

Let @@@X \sim \mbox{U}[a,b]@@@. We will now show that @@@\sigma_X^2 = \frac{(b-a)^2 }{12}@@@.

Observe if @@@U \sim \mbox{U}[0,1]@@@. When introducing @@@\mbox{U}[a,b]@@@ we showed that the random variable @@@a+(b-a)U@@@ is uniform on @@@[a,b]@@@ and therefore has the same distribution as @@@X@@@. From Corollary 6, we have

$$ \sigma_X^2 = \sigma^2_{(b-a)U} = (b-a)^2 \sigma^2_U.$$

It remains to compute @@@\sigma^2_U@@@. By Corollary 6,

$$ \sigma^2_U = E[U^2] - E[U]^2 = \int_0^1 t^2 dt -( \int_0^1 t dt )^2 = \frac{1}{3} - (\frac 12)^2 = \frac {1}{12}.$$

Thus,

$$\sigma^2_X = \frac{(b-a)^2}{12}.$$
Exercise 5.

In Squareville, all yards are square. The expected area of a yard is @@@144@@@ square yards. The mayor claims that the expected yard perimeter is @@@52@@@ yards. She is lying. Why? (don’t get philosophical. Here, this is a purely mathematical question).

Exercise 6.

Suppose that @@@X@@@ is a RV taking only values in @@@\{-1,0,1\}@@@.

  1. Show that the variance of @@@X@@@ cannot exceed @@@1@@@.
  2. Suppose that the expectation of @@@X@@@ is zero and its variance is @@@0.4@@@. What is the probability that @@@X@@@ is equal to @@@0@@@?

Variance Calculations

Example 9.

Let @@@X\sim \mbox{Bern}(p)@@@. We know that @@@E[X]= p@@@. Let’s compute @@@\sigma^2_X@@@. For this we need to find @@@E[X^2]@@@. Since @@@X^2=X@@@, it follows that @@@E[X^2] =p@@@. Therefore

$$\sigma_X^2 = E[X^2] - E[X]^2 = p - p^2 = p(1-p).$$
Example 10.

Suppose @@@X\sim \mbox{Bin}(n,p)@@@. We show @@@\sigma^2_X = np(1-p)@@@. We know that @@@E[X] = n p@@@. Let’s compute @@@\sigma_X^2@@@. We need to find @@@E[X^2]@@@. We will do this using linearity only without relying on the formula for transformations. Write @@@X=X_1+\dots +X_n@@@, where @@@X_1,\dots,X_n@@@ are the indicators of the @@@n@@@ independent Bernoulli events. Then @@@X^2 = X_1^2+ \dots + X_n^2+ 2 \sum_{i<j} X_i X_j@@@. Since all RVs are indicators, @@@X_1^2=X_1,X_2^2=X_2,\dots,X_n^2=X_n@@@. Therefore @@@E[X_i^2] = E[X_i] =p@@@ for all @@@i@@@. Also, @@@X_iX_j@@@ is @@@1@@@ when both @@@i@@@-th and @@@j@@@-th experiments are successful and is zero otherwise. Since the experiments are independent, the probability they are both successful is the product of the probability of each being successful, and is therefore @@@p^2@@@. Adding it all up we have

$$ E[X^2] = n \times p + 2 \binom{n}{2} p^2 = np +n(n-1)p^2.$$

Therefore,

$$\sigma_X^2 = np+ n(n-1)p^2 - n^2p^2 = np(1-p).$$
Example 11.

Let @@@X\sim \mbox{Geom}(p)@@@. Then @@@E[X] =\frac 1p@@@. We can repeat the differentiation trick we used in the calculation of @@@E[X]@@@, but let’s make it simpler for ourselves and take a shortcut using the third method employed there.

  • If the first experiment is successful, we have one experiment and done: this happens with probability @@@p@@@.
  • If not, count the first experiment and add another number of experiments until the next success. This gives
$$ E[X^2] = p*1 + (1-p) E[ (1+X)^2] = p + (1-p) (1+\frac{2}{p} + E[X^2]),$$

therefore

$$E[X^2] =\frac{1}{p} + \frac{2(1-p)}{p^2}=\frac{2}{p^2} - \frac{1}{p}.$$

Therefore

$$\sigma_X^2 = \frac{1}{p^2} - \frac{1}{p} = \frac{1-p}{p^2}.$$
Exercise 7.

Suppose that @@@X@@@ counts the number of successes in @@@n@@@ Bernoulli trials, each with probability of success @@@p@@@. Show that @@@\sigma^2_X \le np(1-p)@@@, and that equality holds if and only every two experiment are independent.

Example 12.

Let @@@X@@@ be hypergeometric with the same parameters and notation as in our calculation of @@@E[X]@@@. Then we know that @@@X= X_1+\dots+X_n@@@. It follows that

$$\begin{align*} E[X^2] & = E [(\sum_{j=1}^n X_j)^2]\\ & =\sum_{j=1}^n E[X_j^2] + 2 \sum_{j<k} E[X_j X_k]\\ & n \frac{K}{N} + 2 \binom{n}{2} \frac{K-1}{N-1}\frac{K}{N}, \end{align*}$$

because @@@X_j X_k@@@ is the indicator that both @@@k@@@-th and @@@j@@@-th balls were white, and is therefore a Bernoulli RV with parameter @@@\frac{K-1}{N-1}\frac{K}{N}@@@, as we computed earlier. Hence,

$$\begin{align*} \sigma^2_X & = \frac{nK}{N} + n(n-1) \frac{K (K-1)}{N(N-1)} - \frac{n^2 K^2}{N^2}\\ & = \frac{nK}{N}\left (1 + \frac{(n-1)(K-1)}{N-1}- \frac{n K}{N}\right)\\ & = \frac{nK}{N}\left( \frac{N-K+K}{N} +\frac{(n-1)(K-1)}{N-1}- \frac{nK}{N}\right)\\ & = \frac{nK}{N} \left( \frac{N-K}{N} + \frac{(n-1)(K-1)}{N-1}- \frac{(n-1)K}{N}\right)\\ & = \frac{nK}{N}\left( \frac{N-K}{N} + (n-1) \frac{((K-1)N - (N-1) K)}{N(N-1)}\right) \\ & = \frac{nK}{N^2} \left ( -\frac{(n-1) (N-K)}{N-1}+(N-K)\right)\\ & = \frac{n K(N-K)}{N^2}\left( -(n-1)-(N-1)\right) \\ & = \frac{n K(N-K)(N-1)}{N^2(N-1)}. \end{align*}$$

If you’re looking for the variance of negative binomial you’ll have to wait a little. We’ll derive it with no effort in Example Example 6.

Example 13.

Suppose @@@X\sim \mbox{Pois} (\lambda)@@@. We show that @@@\sigma_X^2= \lambda@@@. We know that @@@E[X]=\lambda@@@. We have

$$\begin{align*} E[X^2] & = e^{-\lambda} \sum_{k=0}^\infty k^2 \frac{\lambda^k} {k!}\\ & =e^{-\lambda}\left ( \sum_{k=0}^\infty k(k-1) \frac{\lambda^k} {k!} +e^{-\lambda} \sum_{k=0}^\infty k \frac{\lambda^k} {k!} \right)\\ & = e^{-\lambda} \sum_{k=2}^\infty \frac{ \lambda^k}{(k-2)!}+ \lambda \\ & = \lambda^2 \sum_{j=0}^\infty e^{-\lambda} \frac{ \lambda^j}{j!}+\lambda \\ & = \lambda^2 + \lambda. \end{align*}$$

Therefore @@@\sigma_X^2 = \lambda^2 + \lambda -\lambda^2 = \lambda@@@.

Problems

Problem 1.

The RV @@@X@@@ has PMF given by the following table

x 1 1.5 3 4
{::nomarkdown}@@@p_X(x)@@@{:/nomarkdown} ? {::nomarkdown}@@@\frac{1}{3}@@@{:/nomarkdown} {::nomarkdown}@@@\frac{1}{12}@@@{:/nomarkdown} ?

It is also known that @@@E[X] = \frac{19}{12}@@@.

  1. Complete the missing numbers.
  2. Sketch the CDF of @@@X@@@.
  3. Find the PMF for @@@2X+10@@@.
Problem 2.

Let @@@X@@@ be the RV from Problem 1. For each of the following cases, find the PMF.

  1. @@@Y=X^2@@@.
  2. @@@Y=\min (X,3)-3@@@. In this case, also sketch the CDF.
Problem 3.
  1. Suppose that an RV @@@X@@@ has a CDF that takes only two values. Show that @@@X@@@ there exists @@@c@@@ such that @@@P(X=c)=1@@@.
  2. Suppose that an RV has a CDF that takes exactly three values. Show that there exist @@@a,b@@@ and a Bernoulli RV @@@Y\sim \mbox{Bern}(p)@@@ with @@@p\in (0,1)@@@ such that @@@X = a+ bY@@@.
Problem 4.

A CDF @@@F@@@ is said to be stochastically dominated by a CDF @@@G@@@ (equivalently, @@@G@@@ stochastically dominates @@@F@@@) if @@@G(x)\le F(x)@@@ for all @@@x\in \R@@@. We also say that the RV @@@Y@@@ stochastically dominates @@@X@@@ if @@@F_Y@@@ stochastically dominates @@@F_X@@@.

  1. Show that if @@@X\le Y@@@, then @@@Y@@@ stochastically dominates @@@X@@@.
  2. Find an example two RVs @@@X@@@ and @@@Y@@@ such that @@@Y@@@ stochastically dominates @@@X@@@ yet the condition in part a. fails (that is @@@P(X>Y)>0@@@).
Problem 5.

Suppose that @@@X@@@ is a nonnegative RV with the property that for any @@@x>0@@@, @@@P(X>2x)= (P(X>x))^2@@@. What is the distribution of @@@X@@@?

Problem 6.

I am waiting to be sentenced on a traffic violation. Researching past data, the probability of being acquitted is @@@20\%@@@ with total costs being @@@\$0@@@. and if convicted the total costs (penalty, fees, etc.) are exponentially distributed with Expectation @@@\$3@@@

  1. Write the distribution function of the total costs. Is this random variable continuous? Discrete? Neither?
  2. Find the expectation of this random variable.
Problem 7.

I’m tossing a fair dice twice, independently. I win the amount of the maximal value (in USD). For example if the dice lands @@@2@@@ and @@@3@@@ (or @@@3@@@ and @@@2@@@), then I win @@@\$3@@@.

  1. What is the expectation and variance of my winning?
  2. Assuming that one of the tosses was @@@1@@@, what is the new expectation of my winning?
Problem 8.

The density @@@f_X@@@ of an RV @@@X@@@ is given by

$$f_X (x) = \begin{cases} c\sin x & 0 \le x \le {\pi}\\ 0 & \mbox{otherwise}\end{cases}$$
  1. Compute @@@c@@@.
  2. Compute the expectation and variance of @@@X@@@
  3. Compute the expectation and variance of @@@\cos (X)@@@.
Problem 9.

Let @@@X\sim \mbox{Exp}(\lambda)@@@ for @@@\lambda>0@@@.

  1. Find the density of @@@e^{cX}@@@ for @@@c \in \R@@@.
  2. Compute the variance of @@@e^{cX}@@@ for all @@@c@@@ where it is finite.
Problem 10.

Let @@@a,b \in [0,1]@@@. Show the following:

  1. If @@@b< a^2@@@, then there does not exist a RV @@@X@@@ such that @@@E[X]=a@@@ and @@@E[X^2]=b@@@.
  2. If @@@b\ge a^2@@@, there exists a RV @@@X@@@ such that @@@E[X]=a@@@ and @@@E[X^2]=b@@@.
Problem 11.

Suppose that @@@X@@@ has a distribution function of the form

$$ F(x) = \begin{cases} 0 & x< 0 \\ \frac {1}{16} & 0\le x < 1 \\ \frac{x^2}{8} & 1 \le x < 2\\ 1 & x\ge 2\end{cases} $$
  1. Determine what type of RV @@@X@@@ is (discrete, continuous or mixed). If not continuous, find all atoms of @@@X@@@ and calculate their probabilities.
  2. Compute the expectation of @@@X@@@.
  3. Find the CDF of @@@X^2@@@.
Problem 12.

Suppose that @@@X\sim \mbox{U}[0,1]@@@. Find the density of each of the following RVs:

  1. @@@ X^2@@@
  2. @@@\sqrt{X}@@@
  3. @@@1/X@@@
  4. @@@\cos(2X)@@@. Repeat when @@@X\sim \mbox{Exp}(1)@@@.
Problem 13.
  1. A family decided to have children until they have at least one of each sex. Assume each child is equally likely to be a boy or a girl, independently of all other children. Find the probability mass function and expectation of the number of children the family will have.
  2. Repeat the first part assuming that the the probability of a baby boy is @@@p\in (0,1)@@@.
Problem 14.

The probability of a disaster each day is @@@p@@@, independently of the past. Find the expected number of days until the first time we will experience two consecutive days of disasters. Generalize to @@@n@@@ consecutive days.

Problem 15.

This problem was motivated by a video I found on a YouTube channel on Georgia Lottery. A lottery ticket costs $10. Here are the possible outcomes:

  1. With probability @@@10^{-7}@@@ you win $100,000 in cash.
  2. With probability @@@10^{-5}@@@ you win $1000 in cash.
  3. With probability of @@@10^{-3}@@@ you’re entered to the next drawing (identical to first). You cannot sell or transfer this right. Let @@@X@@@ denote the expected prize value. What is the expectation of @@@X@@@?
Problem 16.

This was motivated by a Ted-Ed video (but it is different! we have only one sequence of tosses). Two brothers are repeatedly tossing one fair coin. Orville wins if the pattern HH appears before the pattern HT. Otherwise Wilbur wins.

  1. What is the distribution of number of tosses until the game is decided?
  2. What is the probability of Orville winning the game?
  3. How would your answer to the second part change if the pattern HT was replaced by the pattern TH? (get ready for a surprise) Even more surprising?
Problem 17.

For an event @@@A@@@ write @@@{\bf 1}_A@@@ for the random variable equal to @@@1@@@ on @@@A@@@ and to @@@0@@@ on @@@A^c@@@. This random variable is known as the indicator of @@@A@@@.

  1. Show that @@@P(A)=E [ 1_A]@@@.
  2. Show that @@@{\bf 1}_{A\cap B}={\bf 1}_A {\bf 1}_B@@@, and that @@@{\bf 1}_{A^c}=1-{\bf 1}_A@@@.
  3. By expanding @@@1-E [ (1-{\bf 1}_A)(1-{\bf 1}_B)(1-{\bf 1}_C)]@@@, recover the inclusion-exclusion formula for three sets. Explain what you’re doing.
Problem 18.

(Requires complex analysis) Let @@@X@@@ be @@@\mbox{Pois}(\lambda)@@@. Find the probability that @@@X@@@ is a multiple of @@@3@@@.

Problem 19.

Let @@@X@@@ be an RV with finite second moment.

  1. Let @@@f(x) =E[ (X-x)^2]@@@. Show that @@@f@@@ attains a minimum at @@@x=E[X]@@@, and conclude that @@@E[(X-x)^2] \ge \sigma^2_X@@@ with equality if and only if @@@x= E[X]@@@.
  2. Suppose now that in addition @@@X@@@ takes values in the interval @@@[a,b]@@@. Use the result above to show that @@@\sigma^2_X \le \frac{(b-a)^2}{4}@@@.
Problem 20.

The probabilistic method. This is a name for an approach using probability to solve combinatorial problems. Here’s a fine example. @@@12@@@ students and @@@5@@@ faculty are sitting at a around table. Show that not matter how they are seated, there will be @@@7@@@ adjacent seats with at least @@@3@@@ faculty members among them.

Hint. Fix any seating. Now randomly pick a seat, and let @@@X@@@ be the (random) number of faculty members sitting in the seven adjacent seats from the one picked going clockwise. Compute the expectation of @@@X@@@ and show that it is larger than @@@2@@@. Use this to show that the probability of @@@\{X>3\}@@@ is positive. Explain why this proves the claim.

Problem 21.

At the time I wrote this problem, Mega Millions was in the news, because of a @@@1.6@@@ billion dollar jackpot. The rules of the game are the following: you pick a combination of @@@5@@@ numbers from @@@1@@@ to @@@75@@@, and an additional “mega” number between @@@1@@@ and @@@25@@@. The price of a ticket is @@@\$2@@@. Use the table on Lotto Hub to find what is the expected payout as a function of the jackpot, and the value of the jackpot that makes the game fair (that is: expected payout is the same as the price of ticket).

Problem 22.

(the source is Prof. David Steinsaltz’s blog)

An article in the Guardian from July 2020 has the following quote: ‘‘“The success rate of vaccines at this stage of development is 10%, Shattock says, and there are already probably 10 vaccines in clinical trials, “so that means we will definitely have one.”’’

Below we offer two interpretations for the information given. Calculate the probability that there will be at least one successful vaccine for each of the interpretations.

  1. there are exactly 10 vaccines in clinical trials and the probability of success of each one is 10%, independently of the others.
  2. the number of vaccines in clinical trials is a Poisson random variable with expectation @@@10@@@, each with probability 10% of success, independently of the others.
  3. Bonus. Find another acceptable interpretation and calculate the probability.
Problem 23: Negative Binomial distribution.

Consider a sequence of independent Bernoulli trials with probability of success @@@p\in (0,1]@@@ for each. For @@@r\in \{1,2,\dots\}@@@, let @@@X@@@ denote the number of successes before the @@@r@@@-th failure. These distributions are called negative binomial with parameters @@@p@@@ and @@@r@@@.

  1. Show that the PMF of @@@X@@@ is given by
$$ p_X(k) = \begin{cases} \binom{k+r-1}{k}p^k(1-p)^{r} & k=0,1,\dots,\\ 0 & \mbox{otherwise}\end{cases}$$
  1. Let @@@Y\sim Geom(1-p)@@@. Show that if @@@r=1@@@ then @@@Y-1@@@ and @@@X@@@ have the same distribution. Explain why.
  2. Compute the expectation of @@@X@@@.
Problem 24.

Let @@@X\sim \mbox{Exp}(1)@@@ and let @@@p\in (0,1]@@@. Find a function @@@f@@@ such that @@@f(X)\sim \mbox{Geom}(p)@@@.

Problem 25.

Let @@@X\sim \mbox{Exp}(1)@@@. For any @@@a,b\in (0,1)@@@ Find two sets (unions of intervals) @@@I_a,I_b@@@ such that @@@P(X\in I_a) =a,~P(X\in I_b) =b@@@ and the events @@@\{X\in I_a\}@@@ and @@@\{X\in I_b\}@@@ are independent.

Problem 26.

Each time I hire an employee the revenue the employee will bring to my business changes by @@@X\%@@@ each quarter, where @@@X@@@ is a random variable with expectation zero. Use an expectation argument to determine which strategy is better.

  1. Hire one employee and keep for two quarters.
  2. Hire a new employee each quarter.
Problem 27.
  1. Show that if @@@X@@@ is an RV taking nonnegative integer values, then
$$E[X]= \sum_{n=1}^\infty P(X\ge n).$$
  1. Show that if @@@X@@@ is any RV, then @@@E[|X|]=\int_0^\infty P(|X|>t) dt@@@.
  2. Show that if @@@X@@@ is a nonnegative RV then
$$E[X^n]=n \int_0^\infty P(X>t)t^{n-1}dt.$$
  1. Use the results of the last part to calculate @@@E[X^2]@@@ when @@@X\sim \mbox{Geom}(p)@@@ for @@@p\in (0,1]@@@ and when @@@X\sim \mbox{Exp}(\lambda)@@@. Try not to compute any integrals, but rather use the calculated expectation for the respective RVs.
Problem 28.

It is known that the RV @@@X@@@ has density on @@@[0,1]@@@ which is a function of the form @@@a+bx@@@.

  1. It is also known that its expectation is equal to @@@1/2@@@. Find @@@a@@@ and @@@b@@@.
  2. Repeat assuming now the expectation is @@@p@@@ for @@@p\in (0,1)@@@.
Problem 29.

Suppose that the reported annual salary among people seeking for unemployment benefit in some US state is an Exponential RV with expectation @@@\$20K@@@. Unemployment benefit is @@@80\%@@@ of the reported annual salary, with a cap of @@@\$20K@@@ per year. What is the expected unemployment benefit?

Problem 30.

Bus route 913 makes the 30 miles from Storrs to Hartford in a random time which is uniformly distributed between @@@40@@@ minutes and @@@80@@@ minutes. A cyclist does the same route at a time which is uniformly distributed between @@@55@@@ minutes and @@@65@@@ minutes. We assume both times are continuous RVs.

  1. What is the expected travel time of the bus and of the cyclist?
  2. Whose expected speed is larger?
Problem 31.

A RV @@@Z@@@ is called standard Normal (see Definition ??) if it has density

$$ f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}.$$
  1. Compute @@@E[|Z|]@@@.
  2. Compute @@@E[Z^2]@@@.