Intro

When someone tells you that the probability of something happening is so and so what do they actually mean? (Let’s assume that person is “properly trained” in the mathematical theory of probability. Otherwise, my answer below is has lower probability of being correct).

Here’s what they mean:

  • They are observing some experiment (it could be rolling a die, a soccer match, spread of an epidemic, change in crude oil prices a week from today) whose outcome is (usually) not known in advance, but is in some set of possible outcomes (e.g. the number appearing on the top when rolling a die).
  • “something” refers to a collection of one or more possible outcomes (“even”, “@@@70\%@@@ of the population of NYC will be infected by tomorrow”, “the price of a barrel will not exceed @@@\$75@@@”), which they can determine whether occurred or not after their observation is complete. Such sets are known as “events”.
  • They have a mathematical model that assigns a numerical weight to each event, the weight being a number between @@@0@@@ and @@@1@@@ (@@@0\%@@@ to @@@100\%@@@), that can be interpreted as a quantitative measurement of “likelihood” or “chance” of the event, AKA “probability” of the event. You can think of the model as a mathematical scale, weighing each possible event. When an event weighs zero it is essentially not observable, and when it weighs one it is guaranteed to be observed, and then there’s in between: a range of uncertainty… The probability can be also viewed as a (careful form of) prediction. That’s it. It is important to understand that the probability assigned is dictated by the model used. Very often we all agree on the model (fair die or fair coin), but this is not always the case. Did you ever notice that Weather.com, NOAA and Weather Underground very often give different “chances of rain” for the same day? If not, try right now.

In this chapter we will explore the mathematical framework for what we just described, particularly the structure of the mathematical models assigning probabilities, AKA probability spaces.

A probability space consists of three components we will now try to briefly explain.

  1. The sample space. This is a set including all possible outcomes for the experiment.
    The experiment should be thought of a mechanism resulting in an outcome. This mechanism is often not fully understood (how financial markets work) or is too complex (we know all physics involving rolling a die, yet still we cannot predict what will happen when we roll it), and can be thought of the “randomness”. Whatever the mechanism is at the end it spits out an outcome (value of your portfolio at the end of the year, the number on the top of the die). This procedure of obtaining an outcome is known as sampling, and the collection of all possible outcomes is the sample space. To visualize the sample space and the sampling, think of a claw machine, with sample space being all prize plus on more element a “null” prize. The sampling is the procedure of playing the game, and the outcome of each sampling is the prize. A Claw Crane game machine machine containing unicorn plushes in Trouville, France, Sept 2011
  2. The sigma-algebra (or @@@\sigma@@@-algera) (just a name, don’t dwell on it yet). This represents the information the observer has on the outcome at the conclusion of the observation. In some cases, the observer may not have complete information on the outcome. An event is a statement (“coin landed Heads”) we will be able to determine as true or false at the conclusion of our observation regardless of the outcome, and the sigma-algebra is the collection of all events. To understand the concept, let’s think of the following case. Two dice are rolled. The sample space therefore @@@\Omega= \{1,\dots,6\}^2 = \\{(i,j):i,j \in \\{1,\dots,6\\}\\}@@@. We, as the observers, may have
    • complete information and will be able to identify the outcome, so we will be able to determine whether any statement about the outcome is true of false. For example, we will be able to tell whether first roll is @@@6@@@. Here every statement is an event.
    • partial information on the outcome. For example, we are only told what the sum was, so with the exception of a number of outcomes (how many?) - cannot determine if the statement “first roll is a @@@6@@@” is true or false. Therefore such a statement is not part of the information we have. Another example: “the two dice landed the same number”, and there are many other. We gave two examples of statements which are not events. What are the events then: any statement on the sum is an event, because the sum is exactly the information we are given. Even the statement “the sum is @@@\pi@@@” is an event (we don’t event have to wait for the outcome to determine its truth value though…).
  3. The probability measure. Probability measure is a function assigning a numerical value between @@@0@@@ and @@@1@@@ to every event, representing the “weight” of this event. The weight assigned to each event is also known as the probability of the event. This function must satisfy some rules, but very loosely generalizes the idea of proportion (what proportion of the sample space the event occupies), and in many cases - but not always - coincides with proportions, like in real life some objects “weigh” or are “more typical” than other. Note that with the exception of trivialities, there are many different probability measures for the same sigma-algebra, and therefore saying that the probability of event @@@A@@@ is @@@p@@@ is - with a few exceptions - model specific. We choose what probability measure to use. To understand this better, think again of the dice rolling example, and assume we have complete information.
    • The most intuitive setup is where we weigh each event proportionally to the number of corresponding outcomes. To make this clearer, recall first that we have a total of @@@36@@@ elements in the sample space. The event “sum is @@@6@@@”, corresponds to the five outcomes @@@\\{(1,5),(2,4),(3,3),(4,2),(5,1)\\}@@@, and will be assigned probability of @@@5/36@@@. Similarly, the event “the two dice landed the same number” corresponds to six outcomes and will be assigned probability of @@@1/6@@@.
    • Suppose that there’s some magnet under the table causing both dice to land @@@6@@@ every time. A choice of probability measure that will be faithful to this setup is to assign every statement in which at least one of the dice is not a @@@6@@@ a weight or probability @@@0@@@, and assign all remaining statements probability @@@1@@@. For example, the statement “sum is @@@6@@@” will be assigned probability @@@0@@@ and the statement “the two landed the same number” will be assigned probability @@@1@@@ (this always happens!).
      We just described two distinct probability measures for the same sample space, with the same information available to us (there are many, many more). The choice of which to use was based on the sampling, trying to be faithful to the mechanism. We will discuss how to choose the right probability measure when we talk about the Law of Large Numbers. Until then, we will be always provided with a probability measure, so don’t get stressed about that, yet.

Before the real misery begins, let’s enjoy a fun video. Try to think of the concepts we just discussed as you watch it. https://www.youtube.com/watch?v=Kgudt4PXs28

Sample Spaces

Remember we are observers of an experiment? In probability theory, a sample space is the set of all outcomes in the experiment. Why “sample”? This is because we can turn the experiment around and consider it as some mechanism (usually not fully understood to us - can you tell exactly what happens when you toss a coin?) for “sampling” an element from that set of all possible outcomes, the sample space. We usually denote a sample space by the Greek letter Omega, @@@\Omega@@@, and we usually use @@@\omega@@@ to denote an element in the sample space. Let’s introduce a number of examples.

  • I’m tossing a coin. The possible outcomes are Heads and Tails, which we will write as @@@H@@@ and @@@T@@@ respectively. Then @@@\Omega= \{H,T\}@@@.
  • I’m tossing three coins (or tossing a coin three times in a row). Then each outcome is a sequence of length @@@3@@@ consisting of @@@H@@@ and/or @@@T@@@, representing the outcome of the first, second and third toss respectively. In this case @@@\Omega = \{H,T\}^3 = \{HHH,HHT,HTH,THH,HTT,THT,TTH,TTT\}@@@, a product set. In many cases our experiment will involve repeating some smaller experiment several (or maybe even infinitely many) times, and this is where product sets come handy.
  • I’m recording the price of a stock of Google at the end of the next trading day. We will assume the prices form a continuum. The sample space is all nonnegative numbers, @@@\Omega = [0,\infty)@@@.

Mathematically, a sample space is nothing but a nonempty set.

The following example describes what is perhaps the most important sample space in all of probability:

Example 1.

The coin is tossed repeatedly indefinitely. Then the sample space @@@\Omega@@@ is the infinite product of the set @@@{\cal S}= \{H,T\}@@@, denoted by @@@\Omega= {\cal S}^{\N}@@@, that is, all infinite sequences of H’s and T’s. An @@@\omega@@@ in this sample space is an infinite sequence @@@\omega = (\omega_1,\omega_2,\dots)@@@, where @@@\omega_i \in {\cal S}=\{H,T\}@@@.

Here’s the beginning of an element @@@\omega@@@ in this sample space: @@@HHHTHTTTTTTHHHHH\dots@@@. Here @@@\omega_1 = \omega_2=\omega_3 = H, \omega_4 =T,\dots @@@, representing Heads followed by three more Heads, then a Tails, Heads, no less than @@@6@@@ more Tails, etc…

Events and sigma-algebras

Warm up

As already mentioned, depending on the details, we, the observers of the sampling, may have full or partial information on the outcome. Here we will describe what we mean by “information”.

What is knowledge? Ability to answer questions. One can summarize all our information about the outcome as all yes/no questions we will be able to answer for any possible outcome.

Consider the following scenario. Three coins are tossed. The sample space is @@@\Omega= \{H,T\}^3@@@. During the tossing we were blindfolded, and the blindfold was removed at the conclusion of the tossing. The information available to us is therefore the number of Heads (or Tails), but not the order. We can answer all of the following questions about the outcome:

  1. Are all tosses @@@T@@@?
  2. Is there exactly one @@@H@@@?
  3. Are there exactly two @@@H@@@?
  4. Are there exactly three @@@H@@@?

Furthermore, the answers to all these questions (any three are enough, right?), determine everything we know about the outcome: the number of Heads. There are more questions we can answer: “is the number of @@@H@@@ even?” “are there more @@@T@@@ than @@@H@@@?”, etc… On the other hand, the question “was the first toss @@@H@@@?” is one we cannot always answer. Of course, if we see @@@3@@@ @@@H@@@ or @@@3@@@ @@@T@@@, we can answer it, but we cannot answer it for any of the @@@6@@@ remaining outcomes. This is an example of a question we cannot always answer, and will be removed from our bank of questions.

The point we’re trying to make here is that “information” is synonymous with what yes/no questions we can answer. A long list of questions does not render as a good candidate for a mathematical object. However, everything one can learn from a question is captured in the answers to it. Particularly for a yes/no question on the outcome, an element in the sample space, all that information is captured in the subset of outcomes corresponding to a “yes” answer (all the remaining automatically correspond to a “no” answer). Therefore one can identify the question with the set of outcomes corresponding to a “yes” answer. The question “is the number of @@@H@@@ odd?” which is one we can answer, corresponds to the subset @@@\{HTT,THT,TTH,HHH\}@@@. The question “is the number of @@@T@@@ equal to @@@3.14@@@?” corresponds to the subset @@@\emptyset@@@, the empty set, etc.

Therefore, each question we can always answer will be replaced by a subset of @@@\Omega@@@, representing all outcomes corresponding to a “yes” answer. Each such subset is called an event and the collection of all events in called a sigma-algebra or @@@\sigma@@@-algebra. An event can be also identified with a statement about the outcome: the statement whose veracity the corresponding question aims to determine. The event @@@\{HTT,THT,TTH,HHH\}@@@ corresponds to the statement “the number of @@@H@@@ is odd” (or “number of @@@H@@@ is @@@1@@@ or @@@3@@@” or “number of @@@T@@@ is even”) and the event @@@\{HHH\}@@@ corresponds to the statement “all tosses are @@@H@@@” (or “no @@@T@@@”). The event @@@\emptyset@@@ corresponds to any statement which is always false, like “number of @@@H@@@ is @@@-1@@@” or “stocking on toilet paper is an act of rationality”.

Definition and first properties

In the last section we saw that the seemingly ambiguous notion of “information” can be made quite precise as the collection of subsets of the sample space, each representing all outcomes corresponding to a “yes” answer to a question we, the observers of the experiment, can answer always answer. pro We observe that any such collection satisfies the following properties:

  1. @@@\Omega@@@ is always an event, corresponding to the question “the the outcome in @@@\omega@@@?
  2. Suppose @@@A@@@ is an event. If we negate the corresponding question, this we get a new question we can also answer, and the set of outcomes corresponding to a “yes” answer to the new question is exactly @@@A^c@@@. Therefore @@@A^c@@@ is an event. # Suppose @@@A@@@ and @@@B@@@ are events. We can therefore answer “is outcome in @@@A@@@?” and “is outcome in @@@B@@@?”. Therefore we can answer “is outcome in @@@A@@@ or in @@@B@@@?” and as the set of outcomes corresponding to “yes” for the latter question is the union @@@A\cup B@@@, we conclude that @@@A\cup B@@@ is an event. By iterating, the union of any finite number of events is an event.

A collection of subsets of @@@\Omega@@@ satisfying the three properties above is called an algebra of sets on @@@\Omega@@@. Reverse-engineering, we will use these properties as the basis for our mathematical definition of “information”. I said “basis” because there’s going to be one change associated with infinite unions, as - similarly to real life - some structure can be only viewed from birds’ eye, from space, from infinity… and beyond:

Definition 1.

Let @@@\Omega@@@ be a sample space. A sigma-algebra (or @@@\sigma@@@-algebra) on @@@\Omega@@@ is a set @@@{\cal F}@@@ whose elements are subsets of @@@\Omega@@@, and which satisfies all of the following:

  1. @@@\Omega \in {\cal F}@@@ (not empty)
  2. If @@@A\in {\cal F}@@@ then @@@A^c \in{\cal F}@@@ (closed under complements)
  3. If @@@A_1,A_2,\dots \in{\cal F}@@@ then @@@\cup_{i}A_i\in {\cal F}@@@ (closed under countable unions)

Any sigma-algebra is an algebra. Indeed, if @@@{\cal F}@@@ is a sigma-algebra, then @@@\Omega\in {\cal F}@@@ and therefore its complement, @@@\emptyset \in {\cal F}@@@. Now suppose @@@A_1,A_2\in {\cal F}@@@. Then letting @@@A_3=A_4=\dots =\emptyset@@@, we have @@@A_1 \cup A_3 = \cup_{i=1}^\infty A_i \in {\cal F}@@@.

With this definition, we have a new tool. The information about the outcome of the experiment will be always expressed as a corresponding sigma-algebra. Now for concrete sigma-algebras. We begin with two extreme cases.

  1. No information at all. The only questions we can answer are those which have only one answer: either always “yes” or always “no”. Therefore the sigma-algebra consists of exactly two sets: @@@\Omega@@@ and @@@\emptyset@@@. WE can write this as @@@{\cal F} = \{\emptyset,\Omega\}@@@. For trivial reasons, this is known as the trivial sigma-algebra.
  2. We have all information about the outcome. We can answer any question: if @@@A@@@ is any subset of @@@\Omega@@@, we can answer the question “is the outcome in @@@A@@@?”. Therefore every subset of @@@\Omega@@@ is in the sigma-algebra. This sigma-algebra, the set of all subsets of @@@\Omega@@@ is called the power set of @@@\Omega@@@, and will be denoted by @@@{\cal P}(\Omega)@@@. If @@@\Omega@@@ is finite then the number of elements in its power set is always a power of @@@2@@@. Can you think why?

I want to pause here for a few seconds and make sure we’re all on the same page, regarding the

Example 5.

Suppose we toss a coin and can see the outcome. What is the sample space and sigma-algebra? Clearly, @@@\Omega = \{H,T\}@@@. Since we have all information, @@@{\cal F}= {\cal P}(\Omega) = \{\emptyset, \{H\},\{T\},\{H,T\}\}@@@.

Exercise 2.

Show that the definition implies that a sigma-algebra is closed under finite unions (enough to prove for two). That is, if @@@{\cal F}@@@ is a sigma-algebra and @@@A,B\in {\cal F}@@@, then @@@A\cup B\in{\cal F}@@@.

When @@@\Omega@@@ has more than one element, there are other sigma-algebras than the trivial and the power set. Here’s an example.

Example 6.

Two coins are tossed in a row. The sample space is @@@\Omega =\{H,T\}^2@@@. In each of the following cases determine the sigma-algebra.

  1. You were blindfolded during the tossing with blindfold removed after tossing is complete.
  2. Clearly @@@\emptyset@@@ and @@@\Omega@@@ are events.
  3. Then any subset corresponding to a fixed number of @@@H@@@: @@@\{TT\},\{TH,HT\},\{HH\}@@@.
  4. Add unions of any of the above: @@@\{TT,TH,HT\},\{TT,HH\},\{TH,HT,HH\}@@@. Now look and see that the collection thus obtained consists of @@@8@@@ sets and is closed under complements and unions. Therefore
$${\cal F} = \{\emptyset, \{TT\},\{TH,HT\},\{HH\}, \{TT,TH,HT\},\{TT,HH\},\{TH,HT,HH\}, \Omega\}.$$
  1. You were only told whether there is a @@@H@@@ or not.
  2. Clearly, @@@\emptyset@@@ and @@@\Omega@@@ are events.
  3. The subsets corresponding to zero @@@H@@@ or at least one @@@H@@@: @@@\{TT\},\{HT,TH,HH\}@@@. The four sets lists above are closed under complements and unions. Therefore
$${\cal F} = \{\emptyset, \{TT\},\{HT,TH,HH\},\Omega\}.$$
Exercise 3.

I’m tossing a coin three times. I’m telling you the outcome of the first toss and the number of remaining tosses with the same outcome. Give:

  1. Two examples of non-trivial (not empty and not entire sample space) sets which from your perspective are events, in words (e.g. “first toss is Heads” - don’t use this one!)
  2. Two examples of sets which from your perspective are not events, in words.
Exercise 4.

You are rolling a @@@6@@@-sided die. You’re telling me what is the remainder of the outcome when divided by @@@3@@@. List all elements in the sigma-algebra.

We required sigma-algebras to be closed under countable unions. What about intersections? If we look back at the argument that lead to closure under finite unions, then we can guess that algebras and sigma-algebras are also closed under countable intersections. To see why, this follows from the definition we just presented, recall two important identities about sets widely known as De Morgan’s laws:

\begin{equation} \label{eq:demorgan} \boxed { (\bigcap A_i)^c= \bigcup A_i^c. } \end{equation}

On rewriting this (how?) and using the fact that @@@(A^c)^c=A@@@ for any set @@@A@@@, we obtain

\begin{equation} \label{eq:demorgan2} \boxed { (\bigcup A_i)^c = \bigcap A_i^c. } \end{equation}

We comment that \eqref{eq:demorgan} and \eqref{eq:demorgan2} hold regardless whether we are considering finitely many indices @@@i@@@ or not.

Why now? Because De Morgan’s laws give the following property:

Proposition 1.

If @@@{\cal F}@@@ is a sigma-algebra on @@@\Omega@@@ and @@@A_1,A_2,\dots \in {\cal F}@@@, then @@@\cap_i A_i \in {\cal F}@@@.

Exercise 5.

Prove Proposition 1.

In words, sigma-algebras are closed under countable intersections. Of course, they are also closed under finite intersections, as @@@\Omega@@@ is an element in any sigma-algebra. Simply put: if you have any (finite or countable) number of events, then what obtained by taking unions, complements, intersections of any of them is an event. Of course you can iterate that. The only way to “split” an event is by intersecting it with another event. The only way to “enlarge” an event is by taking its union with another event.

Let’s take a quick pause and make sure we understand the relation between the mathematical operations of union, intersection and complements and words. This is something many find confusing. Remember this:”

  • “Event A and event B” or any other way to say it in words (e.g. “Both events A and B”) is the intersection of @@@A@@@ and @@@B@@@, @@@A\cap B@@@, with the immediate generalization for more than just two.
  • : “and” means “all” means intersection.
  • “Event A or event B” or any other way to say it, most commonly “at least one of the Events A and B”, corresponds to the union of @@@A@@@ and @@@B@@@, @@@A\cup B@@@
  • : “or” means “at least one” means union.

Here is a more verbal example.

Example 7.

I’m playing @@@10@@@ soccer games. For @@@i=1,\dots,10@@@, let @@@A_i@@@ be the event “I won the @@@i@@@-th time”.

  1. The statement “I lost win game @@@i@@@” is the same as “I did not win game @@@i@@@”, the complement of @@@A_i@@@, @@@A_i^c@@@.
  2. The statement “I won at least one game” is the same as “I won game 1 or game 2 or game 3 … or game 10”. This is the union @@@\cup_{i=1}^{10} A_i@@@. Union of events is the same as “at least one of the events”.
  3. The statement “I lost all games” is the complement of the statement in the last bullet item, the union @@@\cup_{i=1}^{10} A_i@@@. It also means “I lost game 1 and game 2 and … and game 10”. That is, it is the intersection @@@\cap_{i=1}^{10} A_i^c@@@. In other words (or without words, actually), @@@(\cup_{i=1}^{10} A_i)^c= (\cap_{i=1}^{10} A_i^c)@@@. This is \eqref{eq:demorgan2} in action.
  4. The statement “I won all games” is, of course, the intersection @@@\cap_{i=1}^{10} A_i@@@. Its complement is “I lost at least one game”, which is the same as “I lost game 1 or game 2 or … or game 10”. This is the union @@@\cup_{i=1}^{10} A_i^c@@@. Thus, @@@(\cap_{i=1}^{10} A_i)^c = \cup_{i=1}^{10} A_i^c@@@, and this is \eqref{eq:demorgan} in action.
  5. The statement “I lost the first 5 games and won at least one in the last 5” is the intersection of the following two:
    • “I lost game 1 and game 2 …. and … game 5 “, @@@\cap_{i=1}^5 A_i^c@@@; and
    • “I won game 6 or game 7 or … or … game 10”, @@@\cup_{i=6}^{10} A_i@@@. Therefore we can write it as @@@\left(\cap_{i=1}^5 A_i^c\right)\cap \left(\cup_{i=6}^{10} A_i\right)@@@.
  6. The complement of the statement in the last item will be obtained by repeatedly applying DeMorgan’s laws.
  7. Set @@@A=\left(\cap_{i=1}^5 A_i^c\right)@@@ and @@@B=\left(\cup_{i=6}^{10} A_i\right)@@@. Then the statement in the last item is @@@A\cap B@@@.
  8. Apply \eqref{eq:demorgan} to obtain @@@(A\cap B)^c = A^c \cup B^c@@@.
  9. Apply \eqref{eq:demorgan} to @@@A@@@ to get ##:* @@@A^c = \cup_{i=1}^5 A_i@@@ (recall that @@@(A^c)^c=A@@@, which we have used here)
  10. Apply \eqref{eq:demorgan2} to @@@B@@@ to get ##:* @@@B^c=\cap_{i=6}^{10} A_i^c@@@.
  11. Conclude @@@(A\cap B)^c = A^c \cup B^c = \left ( \cup_{i=1}^5 A_i \right) \cup (\cap_{i=6}^{10} A_i^c)@@@.
  12. In words “I won at least one of the first games or lost all last five games”

Back to manipulating events:

Example 8.

If @@@A@@@ and @@@B@@@ are events, the set “elements only in exactly one of @@@A@@@ or @@@B@@@” (also known @@@A@@@ exclusive or @@@B@@@) is an event. Why? Because it is the union of the two sets @@@C_1@@@ and @@@C_2@@@ where

  1. @@@C_1@@@ is all elements in @@@A@@@ but not in @@@B@@@. That is all elements both in @@@A@@@ and in @@@B^c@@@, which is simply @@@A \cap B^c@@@, the intersection of two events, hence an event.
  2. @@@C_2@@@ is all elements in @@@B@@@ but not in @@@A@@@. This is … @@@B \cap A^c@@@, an event because of the same reason as before.

Since @@@C_1@@@ and @@@C_2@@@ are events, so is their union, @@@C_1 \cup C_2@@@, all elements exclusively in @@@A@@@ or in @@@B@@@.

Exercise 6.

Suppose that @@@A,B@@@ and @@@C@@@ are events. Show by using unions, intersections and complements that the set “all elements not in @@@C@@@ but in @@@A@@@ or in @@@B@@@” is an event.

Exercise 7.

Suppose that @@@A_1,\dots,A_n@@@ are distinct events. Show the set “all elements which are in exactly two of the events” is an event by expressing it through unions, intersections and complements.

Let’s look at an example where we perform infinitely many operations.

Example 9.

Suppose we are tossing a coin repeatedly. The sample space is therefore @@@\Omega=\{H,T\}^{\N}@@@, space of all infinite sequences formed from @@@H@@@ and @@@T@@@. Assume we are able to observe each toss, which means we have an infinitely long memory (and patience!). We will show that “@@@H@@@ will appear infinitely many times” is an event.

Let @@@A_n@@@ be “n-th toss is @@@H@@@”. This is an event, because we can answer that corresponding question. Now @@@B_N= \cup_{n= N+1} A_n@@@ is an event, because it is a countable union of events. In words, it is the event “there will be at least one @@@H@@@ after the @@@N@@@-th toss”. Now, “infinitely many @@@H@@@” can be restated as “for every @@@N@@@, there will be at least one @@@H@@@ after the @@@N@@@-th toss”. The latter is simply the intersection @@@\cap_{N=1}^\infty B_N@@@. As any countable intersection of events is an event, this is an event.

Two things to note about our last example:

  1. It shows how complex events can be obtained by repeatedly applying unions, intersections and complements. Want a challenge? Try to show that “the limit of the proportion of @@@H@@@ among the first @@@N@@@ tosses tends to @@@1/2@@@ as @@@N\to\infty@@@” is also an event. The rule of thumb is this: every statement one can express in words about your observation corresponds to an event.
  2. It is an example of a sigma-algebra with infinitely many elements. Such sigma-algebras are pretty complex, and are usually not defined explicitly. Though it may be tempting to think it is actually the power set, it is not. This seems like it is contradicting our rule of thumb, but it doesn’t, due to subsets provided by the axiom of choice. Ask your teacher!

Generation of sigma-algebras

In this section we will discuss some properties of families of sigma-algebras. Suppose you have some information on the outcome and I have some information, representing by the sigma-algebras @@@{\cal F}_1@@@ and @@@{\cal F}_2@@@. Consider the following questions:

  1. What is the sigma-algebra representing only information we both know? In terms of questions, this is all question both of us can answer. In terms of subsets, this corresponds to all subsets both in @@@{\cal F}_1@@@ and @@@{\cal F}_2@@@. That is the collection @@@{\cal F}={\cal F}_1\cap {\cal F}_2@@@. It is not surprising that this latter object is itself a sigma-algebra. Thus, one expects intersections of sigma-algebras to be a sigma-algebra.
  2. What about our combined information, that is, all questions we can answer collectively? Does this correspond to the union of @@@{\cal F}_1@@@ and @@@{\cal F}_2@@@? No, unless @@@{\cal F}_1={\cal F}_2@@@. Why? Suppose @@@A_1 \in {\cal F}_1@@@ and @@@A_2 \in {\cal F}_2@@@, but @@@A_1 \not \in {\cal F}_1@@@ and @@@A_2\not \in {\cal F}_2@@@ (otherwise, why bother?). Now “is outcome in @@@A_1@@@ or in @@@A_2@@@?” is an answer we can answer with joint efforts: I can tell whether it is in @@@A_1@@@ and you can tell whether it is in @@@A_2@@@, so after consulting we can determine whether determine whether the outcome is in @@@A_1@@@ or in @@@A_2@@@, that is whether it is in @@@A_1\cup A_2@@@. As a result, the event @@@A_1\cup A_2@@@ must be in the sigma-algebra corresponding to our combined information. However @@@{\cal F}_1\cup {\cal F}_2@@@ consists entirely of subsets in @@@{\cal F}_1@@@ or in @@@{\cal F}_2@@@, and @@@A_1 \cup A_2@@@ is in neither sigma-algebras. But by our assumption, this event is in neither sigma-algebras. Indeed, the union is, say, in @@@{\cal F}_1@@@, then @@@A_2\cap A_1^c = (A_1 \cup A_2 )\cap A_1^c \in {\cal F}_1@@@ and @@@A_2\cap A_1 = (A_1 \cup A_2) \cap A_1 \in {\cal F}_1@@@. Therefore @@@A_2 = (A_2 \cap A_1^c) \cup (A_2 \cap A_1) \in {\cal F}_1@@@, a contradiction to our assumption. Bottom line: when it comes to information, the “sum” is bigger than its parts.

In the rest of this section we discuss topics around these very two questions. We begin with a rather general answer to the first.

Proposition 2.

The intersection of any collection of sigma-algebras on @@@\Omega@@@ is a sigma-algebra on @@@\Omega@@@.

An application of the proposition gives:

Corollary 3.

Let @@@{\cal R}@@@ be any collection of subsets of @@@\Omega@@@. Then there exists a unique sigma-algebra on @@@\Omega@@@ called the sigma-algebra generated by @@@{\cal R}@@@, denoted by @@@sigma({\cal R})@@@, satisfying both:

  1. All elements of @@@{\cal R}@@@ are in @@@\sigma({\cal R})@@@.
  2. Any sigma-algebra satisfying 1. contains @@@\sigma({\cal R})@@@.
Exercise 8.

Prove the Corollary.

In words, @@@\sigma({\cal R})@@@ represents the minimal amount of information needed in order to be able determine if the outcome is in any given element @@@R@@@ of @@@{\cal R}@@@. It count be thought as the smallest sigma-algebra containing all elements of @@@{\cal R}@@@. Seems complicated?

Example 11.

Let’s revisit our two coins example from the last section. We had two cases.

  1. The information given to us was the number of @@@H@@@. This is the sigma-algebra generated by the sets “0 @@@H@@@”, “@@@1@@@ @@@H@@@”, that is @@@\sigma( {\cal R})@@@, where @@@{\cal R}=\{\{TT\},\{HT,TH\}\}@@@. As you can see, we omitted “@@@2@@@ @@@H@@@”, because knowing whether the number of @@@H@@@ is @@@0@@@ or @@@1@@@, determines whether it is @@@2@@@ too. Can you think of another choice of two sets generating this sigma-algebra.
  2. The information given to us is whether @@@H@@@ appeared. This is the sigma-algebra generated by the set “@@@0@@@ @@@H@@@”, that is @@@\sigma({\cal R})@@@ where @@@{\cal R} = \{\{TT\}\}@@@.

The last example gave us two very simple of examples of sigma-algebras generated by some sets. Let’s look at a much more general setup, where the notion is actually needed because we cannot list all elements explicitly. Take @@@\Omega= \R@@@, the real line, and let @@@{\cal R}@@@ be the set subset consisting of all intervals. Then any sigma-algebra containing @@@{\cal R}@@@ must contain all countable unions of intervals, intersections of these objects, unions of complements of the resulting objects, etc… This is a pretty large family, and we do not have a good direct description of all of its elements. We call this sigma-algebra the Borel sigma-algebra on @@@\R@@@. This is the smallest sigma-algebra containing all intervals.

Exercise 9.

Show that all of the following sigma-algebras are the same.

  1. The Borel sigma-algebra.
  2. The sigma-algebra generated by intervals of the form @@@(a,b)@@@, where @@@-\infty < a<b<\infty@@@.
  3. The sigma-algebra generated by intervals of the form @@@(a,\infty)@@@, where @@@-\infty < a < \infty@@@.
  4. The sigma-algebra generated by intervals of the form @@@(-\infty,b]@@@, where @@@-\infty < b<\infty@@@.
  5. The sigma-algebra generated by intervals of the form @@@(-\infty,b]@@@ where @@@b@@@ is a rational number.

You may ask now why even bother defining the Borel sigma-algebra, rather than simply working with thew power set directly. The source of the problem comes from the reason why we defined sigma-algebras in the first place: objects we will eventually assign weights to. There will be some reasonable assumptions the weighing function, the probability measure must satisfy, which, if attempted to be defined on a too-large sigma-algebra, may be impossible.

We also have the Borel sigma-algebra in higher dimensions:

Definition 2.

Let @@@\Omega= \R^n@@@. The Borel sigma-algebra on @@@\R^n@@@ is the sigma algebra generated by all open rectangles (or equivalently all open sets), @@@(a_1,b_1)\times \dots \times (a_n,b_n)@@@. It is denoted by @@@{\cal B}(\R^n)@@@. In the sequel we will write @@@{\cal B}@@@ as shorthand for @@@{\cal B}(\R)@@@.

What I want you to take from here is the following. Let’s take @@@n=1@@@. The Borel sigma-algebra is a sigma-algebra containing all intervals (open or not) and anything that can be obtained from intervals through repeated countable unions, complements of unions, etc. This is NOT the power set, but it is a class rich enough to cover all subsets of the real line you are familiar with. Same in higher dimensions. The Borel sigma-algebra contains all nice (and many not so nice) sets, because each can be approximated with rectangles.

In first reading you can skip from here until the end of the section.

Example 12.

More generally, suppose that @@@\Omega=\Omega_1\times \dots \times \dots \times \Omega_n @@@, that is @@@\Omega = \{(\omega_1,\omega_2,\dots,\omega_n):\omega_j \in \Omega_j,j=1,\dots,n\}@@@, with @@@\Omega_1,\dots,\Omega_n@@@ all finite. Let @@@{\cal F}_i@@@ be the sigma-algebra corresponding to observing only the @@@i@@@-th component @@@\omega_i@@@. That is, @@@{\cal F}_i@@@ is exactly all subsets of @@@\Omega@@@ determined by the @@@i@@@-th component. Then @@@A\in {\cal F}_i@@@ if and only if there exists @@@I \subset \Omega_i@@@ such that @@@ A=\{\omega=(\omega_1,\dots,\omega_i\dots,\omega_n): \omega_i \in I\}@@@.

To illustrate the versatility of the idea of sigma-algebra, consider the following.

Example 13.

Let @@@A_1,A_2,\dots@@@ be events. Let @@@B_i,~i=1,2,\dots@@@ be the subset of @@@\Omega@@@ consisting of all elements in at least @@@i@@@ of the events @@@A_1,A_2,\dots@@@. Then each @@@B_i@@@ is an event. Why? @@@B_i = \cup_{I\subset\{1,2,\dots\}} \cap_{i\in I} A_i@@@, because an element belongs to the RHS if and only if it is in at least one of the possible intersections of @@@i@@@ events from @@@A_1,A_2,\dots@@@. Since @@@B_i@@@ is a countable union of finite intersections of events, it follows that @@@B_i@@@ is itself an event.

Exercise 10.

Continuing Example 13.

  1. Let @@@C_i@@@ be the subset of @@@\Omega@@@ consisting of all elements in exactly @@@i@@@ of the events @@@A_1,A_2,\dots@@@. Show that @@@C_i@@@ is an event.
  2. Also show that the subset of @@@\Omega@@@ consisting of all elements in infinitely many @@@A_i@@@’s is an event.

Why am I bugging you with all these theoretical nonsense? Suppose we can toss a coin indefinitely, and we can record the outcome of each toss. The sample space in this case is the one from Example 12, and the sigma-algebra is all subsets of this space determined by observing the outcome of each toss (won’t go into details, but surprisingly enough it is not all subsets of the sample space. Strange things happen at infinity). With this, for any @@@n@@@, the set “@@@n@@@-th toss is a Heads” is an event. And therefore by Example 13, “At least @@@i@@@ Heads” is an event for any @@@i@@@, and then by Proposition 1 the intersection of all of these newly created events is itself an event. What is this intersection? “infinitely many Heads” (at least @@@i@@@ for any @@@i@@@). Similarly, the first part of Exercise 10 tells us that “exactly @@@i@@@ Heads” is an event as well. Ok, but why do we care? because all the beautiful structure (which is also highly useful for applications!) is only fully revealed when we go to infinity (think of it as watching us from space), and very important statements like “asymptotically the proportion of Heads tends to @@@1/2@@@ with probability @@@1@@@” (the law of large numbers) which require being able to assign probabilities to sets involving infinitely many tosses.

Essentially, whatever we can describe in words is an event, but this won’t qualify as a mathematical definition, while sigma-algebras are well-defined mathematical objects, and are pretty easy to describe.

Probability Measures

Warmup discussion

Quick recap. We are the observers of a sampling, an experiment in which an element is sampled from our sample space @@@\Omega@@@. The element sampled is called the outcome (of the experiment). We have a sigma-algebra @@@{\cal F}@@@, a collection of subsets of @@@\Omega@@@, each representing a statement about the outcome, we the observers, are able to determine whether true or false at the conclusion of our observation. These subsets, the elements of @@@{\cal F}@@@, are called events. The pair @@@(\Omega,{\cal F})@@@ is called a measurable space.

This entire structure was just preparation for what we describe in this section, the notion of a probability measure. Probability measure is a mathematical weighing tool. What does it weigh? Events. At an intuitive level it is good to think of the weights as a measure of the size of an event in the sample space, or what portion of the sample space it occupies. The weights assigned are always between @@@0@@@ and @@@1@@@, which is the same as from @@@0\%@@@ to @@@100\%@@@. The weight assigned to the event is referred to as the probability of the event.

Before getting into the formal definition, let’s begin with a concept we’re all familiar with, that of relative frequency or proportion. Suppose @@@\Omega@@@ is finite. The relative frequency of any subset @@@A@@@ is given by @@@|A|/|\Omega|@@@. The relative frequency if a function from the power set of @@@\Omega@@@ to the interval @@@[0,1]@@@ satisfying the following two properties:

  1. The relative frequency of @@@\Omega@@@ is @@@1@@@ (normalization)
  2. If @@@A@@@ and @@@B@@@ are disjoint (do not share any elements, @@@A\cap B = \emptyset@@@), then the relative frequency of the union @@@A \cup B@@@ is the sum of the relative frequencies of @@@A@@@ and @@@B@@@ (property known as additivity).

Note that by iterating the additivity property, the relative frequency of the union of any (finite) number of disjoint subsets is the sum of the respective relative frequencies.

The two properties will continue to hold if we were to assign each element @@@\omega@@@ some value @@@p_\omega\ge 0@@@, with @@@\sum_{\omega\in \Omega} p_\omega =1@@@, and then define the weight of an event @@@A@@@ as the sum of the weighs of its elements, @@@\sum_{\omega\in A} p_\omega@@@. Note that relative frequency corresponds to setting @@@p_\omega = \frac{1}{|\Omega|}@@@ for all @@@\omega \in \Omega@@@. Why do we even care about such cases? They come up naturally, even from relative frequencies…

Example 15.

Remember our blindfolded coin tossing? Our information is number of Heads. As we’ve seen in the last example, the relative frequency of the event “@@@0@@@ Heads”(=”Three Tails”) is @@@1/8@@@, “@@@1@@@ Heads” is @@@3/8@@@, and “@@@2@@@ Heads” is also @@@3/8@@@ and “three Heads” has relative frequency equal to @@@1/8@@@. Then in this experiment the different values for the number of Heads do not all carry the same weight.

What we actually observe can be used to generate a new probability space as follows. Let @@@\bar\Omega@@@ the set of outcomes we can actually observe, that is the number of Heads. That is @@@\bar \Omega =\{0,1,2,3\}@@@. Equip this new sample space with the power set (all subsets are events), and define a probability measure @@@Q@@@ through Proposition Proposition 5, according to the following table:

$$\begin{array}{c|c|c|c|c|} \hline\hline \omega & 0 & 1 & 2 & 3 \\ \hline\hline p_\omega & 1/8 & 3/8 & 3/8 & 1/8 \\ \hline\hline \end{array}$$

If we only care about the number of Heads, we can and should rethink of the experiment, but this time having sample space @@@\{0,1,2,3\}@@@ representing the four possible outcomes, and, to be faithful to the original experiment and “true origin” of each of the numerical outcomes in our new sample space, we should assign a weight of @@@1/8@@@ to @@@0@@@, a weight of @@@3/8@@@ to @@@1@@@ and, a weight of @@@3/8@@@ to @@@2@@@, and a weight of @@@1/8@@@ to @@@3@@@, noting that all weights add up to @@@1@@@. If we were to choose relative frequencies, we would get @@@1/4@@@ for each, but this would not be in line with our original experiment.

Definition and first examples

A probability measure is a mathematical object satisfying the two properties we listed in the last section, but a little more:

Definition 3.

Let @@@(\Omega,{\cal F})@@@ be a measurable space. A probability (measure) @@@P@@@ on @@@{\cal F}@@@ is a function from @@@{\cal F}@@@ to the set of nonnegative real numbers, satisfying

  1. @@@P(\Omega)=1@@@ (normalization).
  2. If @@@A_1,A_2,\dots@@@ are pairwise disjoint events (that is @@@A_i\cap A_j=\emptyset@@@ for @@@i<j@@@), then @@@P(\cup_{i=1}^\infty A_i)=\sum_{i=1}^\infty P(A_i)@@@ (countably additive).

The triple of consisting of sample space @@@\Omega@@@, a sigma-algebra @@@{\cal F}@@@ and a probability measure @@@P@@@ on @@@{\cal F}@@@ is called a probability space and we denoted it by @@@(\Omega,{\cal F},P)@@@. The probability space describes, in this order, what the outcomes are (what we sample), what are the events, and what are the weighs assigned to each event.

The three conditions @@@P(A)\ge 0@@@, 1., and 2. are known as the probability axioms. The countable additivity axiom generalizes the additivity property for relative frequencies beyond a finite number of sets.

Recall that no matter what @@@{\cal F}@@@ is, @@@\Omega@@@ and @@@\emptyset@@@ are events. The probability of @@@\Omega@@@ is set by the normalization axiom. What about the probability of the empty set? Let’s see. Set @@@\alpha = P(\emptyset)@@@. Set @@@A_1=\Omega@@@, @@@A_2=A_3=\dots =\emptyset@@@. Then the normalization and countably additive axioms tell

$$ 1= 1 + \sum_{i=2}^\infty \alpha.$$

Therefore @@@\alpha=0@@@.

Suppose @@@A_1,A_2,\dots,A_N@@@ are pairwise disjoint events. Set @@@A_{N+1}=A_{N+2}=\dots = \emptyset@@@. Then

$$ P(\cup_{i=1}^N A_i) = P(A_1) + P(A_2) + \dots + P(A_N) + \sum_{i =N+1}^\infty0.$$

In particular, letting @@@A\in{\cal F}@@@, @@@A_1=A,A_2 =A^c@@@, we conlcude

$$1=P(\Omega) = P(A)+ P(A^c)\ge P(A)\ge 0.$$

The first and second inequalities are both due to the fact that probabilities are nonnegative.

We proved the following:

Proposition 4.

Suppose @@@P@@@ is a probability measure. Then

  1. @@@P(\emptyset)=0@@@.
  2. If @@@A_1,A_2,\dots,A_N@@@ are pairwise disjoint events, then
$$ P(\cup_{i=1}^N A_i) = \sum_{i=1}^N P(A_i).$$

(finite additivity)

  1. For any event @@@A@@@, @@@0\le P(A)\le1@@@.

Time for actual examples. Here’s the most natural one.

We’ve seen two examples. Both are special cases of the following result.

Proposition 5.

Suppose @@@\Omega@@@ is finite or countable, and for every @@@\omega\in \Omega@@@, let @@@p_\omega \in [0,1]@@@ be such that @@@\sum_{\omega \in\Omega} p_\omega=1@@@. For any @@@A\subset \Omega@@@, set @@@P(A) = \sum_{\omega \in A} p_\omega@@@. Then @@@P@@@ is a probability measure on the power set (=all subsets) of @@@\Omega@@@.

Definition 4.
  1. Suppose @@@\Omega@@@ is finite. The uniform probability measure on @@@\Omega@@@ is the probability measure on the power set (all subsets) of @@@\Omega@@@, given by @@@P(A) = \frac{|A|}{|\Omega|}@@@, or, equivalently, the probability measure in Proposition 5, given by @@@p_\omega= \frac{1}{|\Omega|}@@@ for all @@@\omega@@@.
  2. Let @@@(\Omega,{\cal F})@@@ be a measurable space (sample space and sigma-algebra on it), and suppose that @@@A_0@@@ a nonempty event (element in @@@{\cal F}@@@), with no proper subsets in @@@{\cal F}@@@, except for @@@\emptyset@@@. The @@@\delta@@@-measure (delta-measure) supported on @@@A_0@@@ is the probability measure @@@P@@@ defined by
$$ P(A) = \begin{cases} 1 & A_0\subseteq A \\ 0 & \mbox{otherwise.}\end{cases}$$

In the sequel whenever @@@\Omega@@@ is finite, we tacitly assume that @@@P@@@ is the uniform probability measure, also, when we say “randomly” without specifying a probability measure, we mean that we use the uniform probability measure. It’s a convention (“I randomly pick pants and a shirt” to be understood that the probability of each combination is the same).

The definition of the @@@\delta@@@-measure may seem a little awkward at first sight. The simplest and most common instances of it are when @@@A_0@@@ has exactly one element, say @@@x_0@@@. In this case, it is often common to refer to the @@@\delta@@@ measure as @@@\delta_{x_0}@@@.

Exercise 11.

Why did we require @@@\Omega@@@ to be finite in the definition of the uniform measure?

Don’t be fooled to think that @@@\Omega@@@ must always be finite or countable. After all, there has to be a reason why Calculus is a prerequisite… Here is a result discussing some very important probability measures on @@@\R@@@.

Theorem 6.

Let @@@\Omega=\R@@@. Let @@@f@@@ be a nonnegative Riemann integrable function on @@@\R@@@ satisfying @@@\int f (x) dx =1@@@. Then there exists a unique probability measure on the Borel sigma-algebra, that is the sigma-algebra generated by all intervals, such that for every interval @@@I@@@ we have \begin{equation} \label{eq:prob_density} P(I) = \int_I f(x) dx, \end{equation} where the integral is from the left endpoint of the interval to its right endpoint, regardless of whether either endpoint is an element of the interval or not.

In this case the function @@@f@@@ is called the density of @@@P@@@.

Note that if @@@P@@@ is a probability measure with density as in the theorem, then for any real @@@a\le b@@@, @@@P([a,b]) = P((a,b]) = P([a,b)) = P((a,b)) = \int_a^b f(x) dx@@@. In particular, the second equality gives the following:

Corollary 7.

Let @@@P@@@ be a probability measure on @@@\R@@@ with density. Then:

  1. For every @@@a\in \R@@@, @@@P(\{a\})=0@@@; and more generally,
  2. For every finite or countable set @@@A@@@, @@@P(A)=0@@@.
Proof.
  1. Fix @@@a@@@ and let @@@b>a@@@. Since @@@[a,b] = \{a\} \cup (a,b]@@@ we have; @@@P([a,b])= P(\{a\}) + P((a,b])@@@. Since @@@P([a,b])=P((a,b])@@@, @@@P(\{a\})=0@@@.
  2. Let @@@A@@@ be a finite or countable set. Therefore @@@A@@@ can be written as a finite or countable disjoint union of sets form @@@\{a_i\}@@@, and since each has probability @@@0@@@, the probability of the union is the sum of those zeros.
Example 17.

Let @@@f(x) = \frac{e^{-|x|}}{2}@@@. Let @@@P@@@ the probability measure on @@@\R@@@ with density @@@f@@@. We leave the task of verifying that @@@\int f (x) dx=1@@@ to the reader.

  1. Find @@@P([0,\infty))@@@. We’re looking for probability of an interval @@@I=[0,\infty)@@@ so the answer is @@@\int_0^\infty f(x) dx =\frac 12 \int_0^\infty e^{-x} dx = \frac 12@@@.
  2. Find @@@P([-1,2])@@@ and @@@P((-1,2))@@@. In both cases, we’re looking at intervals with the same endpoints, so the values will be the same and equal to
$$\begin{align*} \int_{-1}^2 f(x) dx & = \int_{-1}^0 \frac{e^{-|x|}}{2}dx+\int_0^2 \frac{e^{-x}}{2} dx \\ & = \int_0^1 \frac{e^{-x}}{2} dx + \int_0^2 \frac{e^{-x}}{2} dx\\& = \frac12 \left (1-e^{-1} + 1-e^{-2}\right) \\& = 1- \frac{e^{-1}+e^{-2}}{2}.\end{align*}$$
  1. Find @@@P(\{x:|x-1|>2\})@@@. Here we are looking at all points whose distance from @@@1@@@ is larger than @@@2@@@. This is the union two disjoint intervals: @@@(-\infty,-1)@@@, @@@(3,\infty)@@@. Therefore
$$\begin{align*} P(\{x:|x-1|>2\}) &= P( (-\infty,-1)) + P((3,\infty))\\ & = \int_{-\infty}^{-1} \frac{e^{-|x|}}{2} dx + \int_3^\infty \frac{e^{-x}}{2} dx\\ & = \int_1^\infty \frac{e^{-x}}{2} dx + \int_3^\infty \frac{e^{-x}}{2} dx \\ & = \frac{e^{-1} + e^{-3} }{2}\end{align*}$$

and calculating these two integrals we get our answer @@@\frac{e^{-1}+e^{-3}}{2}@@@.

More properties of probability measures

In what follows, we omit the reference to @@@\Omega@@@ and @@@{\cal F}@@@.

Proposition 8.

Let @@@P@@@ be a probability measure. Then

  1. (probability of complement) @@@P(A^c)=1-P(A)@@@.
  2. (inclusion/exclusion for two events) @@@P(A\cup B)=P(A)+P(B) - P(A \cap B)@@@
  3. (monotonicity of probability) If @@@A\subseteq B@@@, then @@@P(A)\le P(B)@@@.
Proof.
  1. @@@A@@@ and @@@A^c@@@ are disjoint, and their union is @@@\Omega@@@. Finish.
  2. Use these decompositions into disjoint sets:
$$A\cup B = A \cup (B-A)\mbox{ and }B=(A \cap B)\cup (B-A).$$

(recall that @@@B-A@@@ is shorthand for @@@B \cap A^c@@@ or all elements in @@@B@@@ but not in @@@A@@@). So

$$P(A \cup B) = P(A) + P(B-A),\mbox{ and } P(B) = P(A \cap B) + P(B-A).$$

Now finish with the algebra.

  1. @@@B=A \cup (B-A)@@@, so @@@P(B) = P(A) + P(B-A) \ge P(A)@@@.

Note the connection between item 2 and the addition rule from combinatorics.

Here is a very simple implementation of all three parts of the Proposition.

Example 18.

The probability that both students Alpha and student Beta will fail this course is @@@6\%@@@. The probability that at least one of them will pass the course is? Well, we’re asking for the probability of the complement of the first event, and the answer is @@@94\%@@@.

Now let’s show that the probability that for at least one of the two, the probability of passing is at least @@@47\%@@@. Let @@@A@@@ be the event “Alpha” passes and let @@@B@@@ be the event “beta” passes. Then @@@A\cup B@@@ is the event that at least will pass, so we know @@@P(A\cup B)=0.94@@@. By the second part of the proposition,

$$ 0.94 = P(A\cup B) = P(A) + P(B) - P(A\cap B) \le P(A) + P(B).$$

As the sum @@@P(A) +P(B) \ge 0.94@@@, at least one of the values much be larger or equal to half of @@@0.94@@@, that is @@@0.47@@@.

Now what can you say about the probability that Beta will fail the course? This event contains the event “both Alpha and Beta failing the course”, and by the third part of the proposition its probability is larger or equal to the probability of both has probability at which is at most @@@6\%@@@.

Let’s combine combinatorics and the very little probability we know so far. This example also uses the first part of the proposition, the probability of the complement. It shows that sometimes switching to the complement makes life easier.

Example 19.

Birthday problem. What is the probability that in a class of @@@35@@@ students at least two will be born on the same day?

Let’s answer. What’s the sample space? the product of @@@k=35@@@ copies of @@@\{1,\dots,365\}@@@ (all sequences of length @@@35@@@ from numbers in @@@\{1,\dots,365\}@@@). Then by the product rule, @@@|\Omega| = 365^{35}@@@. That’s a darn big number! As we did not assume anything else, the probability is uniform. Let @@@A@@@ be the event that at least two share a birthday. Then @@@P(A)= |A|/|\Omega|@@@. But finding @@@|A|@@@ directly is a bookkeeping nightmare. Let’s make the problem easier by looking at the complement of @@@A@@@, @@@A^c@@@, the event “all birthdays are distinct”. How many such sequences there are? Same as number of permutations @@@k=35@@@ elements from @@@n=365@@@ elements. So,

$$ P(A^c)= \frac{365!}{(365-35)!}\cdot \frac{1}{365^{35}}.$$

or

$$ P(A) = 1-\frac{365!}{(330)!}\cdot \frac{1}{365^{35}}=81.44\%$$

Meaning @@@81.44\%@@@ of all possible ways to form a class of 35 (= a list of 35 numbers from @@@1@@@ to @@@365@@@) contains at least one pair with the same birthday. Want to try other figures? There are plenty Birthday Problem calculators online. I used this one .

The birthday problem is a true classic https://www.youtube.com/watch?v=KtT_cgMzHx8&vl=en

The next example uses the second part of the proposition.

Example 20.

At least @@@60\%@@@ of the students at UCONN graduated from a CT high-school. @@@70\%@@@ of the students live on Campus. What is the minimal proportion of students who both graduated from a CT high school and live on campus? Here @@@P@@@ is uniform on the UCONN students. Now just use property b. in the proposition with @@@P(A)=0.6@@@ and @@@P(B)=0.7@@@. Then

$$1 \ge P(A \cup B)=P(A)+ P(B) - P(A\cap B)= 1.3 - P(A\cap B).$$

Therefore

$$ P(A \cap B)\ge 30\% $$

Computing the probability of a union of events may be gruesome. There is a general formula, the inclusion/exclusion formula we will discuss in more detail in the next section, a snapshot of which we saw in the second part of Proposition 8, and it uses the probabilities of intersections of the events, something we may not be able to calculate. Here is a pretty crude but very useful bound known as the union bound.

Proposition 9.

(Union bound) Let @@@A_1,A_2,\dots@@@ be events (finite or countable). Then @@@P( \cup A_i) \le \sum P(A_i)@@@.

This results somewhat complements the additivity property in the Definition 3 of probability measures: probability of a union is bounded above by the sum of the probabilities, and an equality holds if the events are disjoint.

Proof.

We will prove the statement only for a finite union. We will present the general proof in Example 24.

From the second part of Proposition 8 we know that for any two events @@@A@@@ and @@@B@@@, @@@P(A\cup B)\le P(A)+ P(B)@@@. Let’s continue by induction on @@@n@@@, the number of indices in our union. The base case corresponding to @@@n=2@@@ was established in the last sentence. Assume then @@@P(\cup_{i=1}^n A_i) \le \sum_{i=1}^n P(A_i)@@@. Then by applying the result for @@@n=2@@@ we have

$$ P( \cup_{i=1}^{n+1} A_i) = P( (\cup_{i=1}^n A_i)\cup A_{n+1}) \le P(\cup_{i=1}^n A_i) + P(A_{n+1})\le \sum_{i=1}^n P(A_i) + P(A_{n+1}),$$

where the last inequality follows from the induction hypothesis. This completes the proof.

Here is an illustration of the usefulness of the bound.

Example 21.

The probability a student in my class is reading this text is @@@2\%@@@. I have @@@35@@@ students in the class. We will show that the probability that none of them read this text is at least @@@30\%@@@.

Let @@@A@@@ be the event that at least one student read this text. Now @@@A@@@ is the union of @@@A_1,\dots,A_{35}@@@ where @@@A_i@@@ is the event “student @@@i@@@” read this text. By the union bound

$$ P(A) = P( \cup_{i=1}^{35} A_i) \le \sum_{i=1}^{35} P(A_i) = 35*0.02 = 0.7.$$

So @@@P(A)\le 0.7@@@. Note that the event we are interested is the complement of @@@A@@@, and therefore we have

$$ P(A^c) = 1-P(A) \ge 1- 0.7 =0.30.$$

The Inclusion/Exclusion Formula

Proposition 8-2 generalizes the addition rule from number of elements in sets to probabilities of events. It states that if @@@A@@@ and @@@B@@@ are events, then the probability of their union @@@A\cup B@@@ can be expressed in terms of the probabilities of the events themselves and their intersection. In this section we generalize this formula. The generalization is known as the inclusion/exclusion formula. It may seem complicated at first sight, but it is nothing but the result of applying the rule for two events repeatedly. The principle is simple, the resulting formula is less. Sometimes understanding is better than memorizing… Let’s list it:

Proposition 10.

Let @@@A_1,\dots,A_n@@@ be events. Then

$$\begin{align*}P(\cup_{i=1}^n A_i)&=P(A_1)+\dots +P(A_n) \\ & - \sum_{i<j} P(A_i \cap A_j)\\ & + \sum_{i<j<k}P(A_i \cap A_j \cap A_k)\\ &\dots\\ & (-1)^n P(\cap_{i=1}^n A_i) \end{align*}$$
Proof.

Induction on @@@n@@@, with base case @@@n=2@@@ given by the Proposition.

Here’s a compact way to write the inclusion/exclusion formula:

\begin{equation} \label{eq:incl_exclusion} \boxed { P(\cup_{i=1}^n A_i)= \sum_{k=1}^n (-1)^{k+1}\sum_{I \subset{1,\dots,n},|I|=k} P(\cap_{i\in I}A_i) . } \end{equation}

How likely is everything to go wrong? Here is a theoretical answer, a true classic, known as the Old Hats Problem, or the Matching Problem, or Absent minded secretary problem, and probably other names as well.

Example 22.

I have @@@n@@@ cards, labeled @@@1,\dots,n@@@. I’m randomly picking the cards, one at a time. What is probability that the first card taken is not @@@1@@@, second is not @@@2@@@, …, @@@n@@@‘th card is not @@@n@@@? Sounds boring, so let’s give it some more spice. A group of @@@n@@@ friends is skinny-dipping in the ocean. Each has one towel on the beach. A shark comes and everybody is running out of the water (all are safe), each grabbing a random towel. What is the probability that none of them will grab their own towel?

What do you think happens when @@@n@@@ is really large?

The probability measure is uniform on @@@\Omega@@@, all permutations of @@@\{1,\dots,n\}@@@ (the lists of all orderings of cards picked), and the event @@@B@@@ we are interested in is an intersection. Inclusion/Exclusion is for unions. By deMorgan’s rules, the complement of @@@B@@@ is a union. Let @@@A_j@@@, @@@j=1,\dots,n@@@ denote the event “@@@j@@@‘th card selected is @@@j@@@”. Then the complement of @@@B@@@ is the union of the @@@A_j@@@’s. What is @@@A_j@@@? All permutations of @@@\{1,\dots,n\}@@@ with the @@@j@@@‘th element being @@@j@@@. All remaining @@@(n-1)@@@ elements can be permuted any way in the remaining @@@n-1@@@ places. There are exactly @@@(n-1)!@@@ such permutations. What about intersection @@@A_i\cap A_j@@@? All permutations with both @@@i@@@ and @@@j@@@ in the @@@i@@@‘th element and @@@j@@@‘th element being @@@i@@@ and @@@j@@@, respectively, and all remaining @@@n-2@@@ elements can be permuted any way in the remaining @@@n-2@@@ places. There are @@@(n-2)!@@@ such permutations. We can continue for intersection of any @@@3@@@, … @@@n@@@. Since @@@|\Omega|@@@, the number of all permutations of @@@\{1,\dots, n\}@@@, is equal to @@@n!@@@, it follows from the Inclusion/Exclusion formula that

$$ 1- P(B) = P(\cup_{i=1}^n A_i) = \sum_{k=1}^n (-1)^{k+1} \sum_{I \subset\{1,\dots,n\},|I|=k} \frac{(n-k)!}{n!} .$$

Now the number of subsets of size @@@k@@@ of @@@\{1,\dots, n\}@@@ is @@@\binom{n}{k}@@@ (number of ways to choose @@@k@@@ elements from @@@n@@@), and therefore the RHS is

$$ \sum_{k=1}^n (-1)^{k+1} \binom{n}{k} \frac{(n-k)!}{n!} = \sum_{k=1}^n (-1)^n \frac{1}{k!}.$$

Summing up:

$$ P(B) = 1+\sum_{k=1}^n (-1)^k \frac{(-1)^k}{k!}= \sum_{k=0}^n \frac{(-1)^k}{k!}.$$

This is the sum of the first @@@n+1@@@ terms in the Taylor expansion of @@@e^{-1}@@@. So although it is a pretty long formula for finite @@@n@@@, it gets pretty fast very close to the infinite sum, @@@\frac{1}{e}\sim 0.368=36.8\%@@@. In fact, if @@@n=7@@@, the difference is about @@@0.002=.2\%@@@, but that is not surprising. What is surprising at first (at least for me) is that the probability does not tend to zero as the number increases: about @@@37\%@@@ of the possible cases for any choice of @@@n@@@, will yield that none of the cards will be have the same number as the time it was drawn. In about @@@37\%@@@ of the cases everything will go wrong.

Probability measures on @@@\R^n@@@ with densities

We talked mostly about probability measures on finite sample spaces and touched some probability measures on @@@\R@@@ in Theorem 6. In this section we will expand the discussion a little more, also going to the @@@n@@@-dimensional space @@@\R^n@@@.

Let’s first consider @@@\R@@@, the real line, as our sample space. In our to define a probability measure on it, we first need a sigma algebra. We’re already familiar with two ones: the trivial and the power set, yet the former is too small (has only two elements) and the latter is too big as this is beyond our level.

The most natural objects on the real line are intervals, and therefore it makes perfect sense to consider only those sigma algebras that contain all intervals. The smallest sigma algebra that contains all intervals is the Borel sigma algebra. Of course, it contains much more than just intervals, for example, it contains countable unions of intervals, complements of such objects (which themselves are not necessarily countable unions of intervals), intersections of those, etc…

The higher dimension analog of intervals is a measurable rectangles, that is products of intervals. If @@@I_1,I_2,\dots,I_n@@@ are intervals in @@@\R@@@, then the corresponding measurable rectangle in @@@\R^n@@@ is the set @@@ I_1\times \cdots \times I_n = \{ {\bf x} =(x_1,\dots,x_n)| x_1 \in I_1, \dots,x_n \in I_n\}@@@. The Borel sigma algebra on @@@\R^n@@@ is the smallest sigma algebra containing all measurable rectangles.

For compactness, we may sometimes refer to @@@\R@@@ as @@@\R^1@@@.

We will completely avoid the technical discussion but here is the important principle to remember:

  • Any subset of @@@\R^n@@@ you can name, describe in words or write a formula for is a Borel set. Everything you saw in multivariable calculus, and much beyond. This is a huge set of objects, but it does not contain all subsets of @@@\R^n@@@.
  • In light of the above, in what follows, whenever discussing probability measures on @@@\R^n@@@, we will always assume that the underlying sigma algebra is the Borel sigma algebra.

We begin with a result describing some of the most commonly used probability measures on @@@\R^n@@@, and which generalizes Theorem 6.

Theorem 11.

Let @@@f@@@ be a nonnegative Riemann integrable function on @@@\R^n@@@ satisfying @@@ \int_{\R^n} f({\bf x}) d{\bf x} =1@@@. Then there exists a unique probability measure @@@P@@@ on @@@\R^n@@@ satisfying \begin{equation} \label{eq:cts_prob} P({\bf I} ) = \int_{\bf I} f({\bf x}) d {\bf x}, \end{equation} for every measurable rectangle (interval if @@@n=1@@@) @@@{\bf I}@@@. Moreover, it is sufficient to require \eqref{eq:cts_prob} to any of the classes generating the Borel sigma algebra. The function @@@f@@@ is called the density of the probability measure @@@P@@@.

Before we discuss the result, we will need an important definition.

Definition 5.

Let @@@P@@@ be a probability measure on @@@\R^n@@@ with density @@@f@@@. If there exists a Borel set @@@C@@@ and a constant @@@c>0@@@, such that

$$ f(x) = \begin{cases} c & x \in C \\ 0 & \mbox{otherwise}\end{cases}$$

then @@@P@@@ is called the uniform probability measure on @@@C@@@. Furthermore, in this case @@@c@@@ is the reciprocal of the volume of @@@C@@@ in @@@\R^n@@@.

Time for an example.

Example 23.

Consider the uniform probability measure on the unit square @@@C=\{(x,y)\in \R^2: 0\le x,y\le 1\}@@@. Since we require @@@\int_{\R^2} f ({\bf x}) d{\bf x}=1@@@, we have @@@f=1@@@ on @@@C@@@ and @@@f=0@@@ otherwise. In particular, @@@P@@@ of any set contained in @@@C@@@ is simply its area.

Think of @@@C@@@ as the top of my car (I know, it is not square, etc.). Eventually, a bird will pass over my car (right out of car wash), and will drop its dropping on my car (no preferred spot or area). Think of the bird’s dropping as a tiny speck (or identify it with its center of mass). The uniform probability describes the probability the poop will be in any part of my car:

  • If I have a sunroof occupying @@@1/8@@@ of the top of my car, the probability of the sunroof is @@@1/8@@@.
  • If I have a very narrow decorative line passing through the top of the car, then the probability it will be hit is (in this model) is zero, because the area of a line is zero.
  • If I already have spots all over my car with total area of @@@1/8@@@, the probability it will hit the spots is… again @@@1/8@@@.

We remind that not all probability measures on @@@\R^n@@@ have densities. One example is of the @@@\delta@@@ measure (with @@@A_0=\{x_0\}@@@).

Exercise 12.

Show that if @@@P=\delta_{\bf x_0}@@@ for some @@@{\bf x_0}\in \R^n@@@ then @@@P@@@ does not have a density.

Continuity of probability

This notion is of fundamental importance though we will rarely use it in our discussion. It concerns infinite sequences of events. We need some preparation.

Whenever dealing with an infinite sequence of events, it is convenient to re-express them as a sequence of pairwise disjoint events.

Proposition 12.

Suppose that @@@A_1,A_2,\dots@@@ are events. Let @@@B_1 = A_1@@@ and continue inductively, letting

$$B_{i+1} = A_{i+1} - \cup_{j\le i} A_j,$$

that is @@@B_{i+1}@@@ is all elements in @@@A_{i+1}@@@ which are not in any of the preceding events @@@A_1,\dots,A_i@@@. Then

  1. @@@B_1,B_2,\dots@@@ are pairwise disjoint.
  2. For every @@@i@@@, @@@\cup_{j\le i} B_i = \cup_{j\le i} A_i@@@.

The first application is a proof for the union bound in the countable case. We will list it as an example.

Example 24.

Recall the union bound from Proposition 9. Here we prove the general case. Let @@@(A_i:i\in\N)@@@ be events. We will show that @@@P(\cup_{i=1}^\infty A_i)\le \sum_{i=1}^\infty P(A_i)@@@.

Let @@@B_1,B_2,\dots@@@ as in Proposition 12. Then @@@\cup_{i=1}^\infty B_i= \cup_{i=1}^\infty A_i@@@, but since the @@@B_i@@@’s are pairwise disjoint, we have

$$ P(\cup_{i=1}^\infty A_i) =\sum_{i=1}^\infty P(B_i)\le \sum_{i=1}^\infty P(A_i),$$

where the last inequality follows form the fact that @@@B_i \subset A_i@@@ and the monotonicity Proposition 8-3.

Corollary 13.

Suppose that @@@A_1,A_2@@@ are events. Then

$$ P(\cup_{i=1}^\infty A_i) \le \sum_{i=1}^\infty P(A_i).$$

Suppose that @@@A_1\subset A_2 \subset@@@. Clearly, @@@\cup_{i=1}^\infty A_i@@@ is contained in any set containing all of the @@@A_i@@@. That is, the union is the smallest set containing all of the @@@A_i@@@’s. Much in the same spirit that the limit of an non-decreasing sequence of numbers is the smallest number larger or equal to all of them (if there exists such a number), we declare @@@\cup_{i=1}^\infty A_i@@@ as the limit of the non-decreasing sequence of sets @@@A_1,A_2,\dots@@@. The following shows that that if we first take limit of the events then compute its probability; or first compute probabilities of the individual events, and then take their limits, we get the same number. This is a statement identical to limits of sequences under continuous functions, and is therefore dubbed as continuity of probability measure with respect to non-decreasing sequences of events:

Proposition 14.

(continuity from below) Suppose that @@@A_1\subseteq A_2 \subset \dots@@@ are events. Then @@@ P(A_n)\underset{n\to\infty} {\nearrow} P(\cup_{n=1}^\infty A_n) @@@.

Proof.

The fact that @@@P(A_n) \le P(A_{n+1})@@@ follows from monotonicity of probability measures, Proposition ??-c. It remains to prove convergence. Let @@@A@@@ be the union. Clearly @@@P(A) = \sum_{n=1}^\infty P(B_n)@@@, where @@@B_1,B_2,\dots@@@ are as in Lemma ??, and the sum on the righthand side converges. Therefore, by definition of an infinite series

$$ \sum_{n=1}^\infty P(B_n) =\lim_{N\to\infty} \sum_{n=1}^N P(B_n) =\lim_{N\to\infty} P(A_N).$$

What about decreasing limits? This is a simple consequence of the proposition.

Corollary 15.

(continuity from above) Let @@@B_1\subset B_2 \subset \dots@@@. Then @@@P(B_j) \underset{j\to \infty} {\searrow}P( \cap_{j=1}^\infty B_j)@@@.

Proof.

Let @@@A_j = B_j^c@@@. Then @@@A_1 \subset A_2 \subset \dots@@@, and we have @@@P(A_j) \nearrow P( \cup_{j=1}^\infty A_j)@@@. The LHS is @@@1-P(B_j)@@@ and by deMorgan’s laws, @@@\cup_{j=1}^\infty A_j) = (\cap_{j=1}^\infty B_j)^c@@@, therefore the RHS is equal to @@@1-P(\cap_{j=1}^\infty B_j)@@@. The result follows.

Example 25.

I’m tossing a fair coin repeatedly, independently. What is the probability that all finite patterns will eventually appear? That is, given a pattern, say @@@HHTHTTT@@@, it will appear somewhere in the sequence, and so will all patterns of all finite lengths.

Since there is a finite number of patters of every length, there is a countable infinite number of patterns. This means we can (arbitrarily) label them as pattern @@@1@@@, @@@2@@@, etc. If you think a little, it is not hard to find a concrete labeling scheme.

Follow closely, because we have a number of important arguments here. Let @@@A_i@@@ be the event that pattern @@@i@@@ appears. We will show that @@@P(A_i)=1@@@. Let @@@B_i = \cap_{i\le j} A_i@@@. Then by De Morgan’s laws, @@@B_j^c = \cup_{i\le j} A_i^c@@@ and therefore

$$P(B_j^c) \le \sum_{i\le j} P(A_i^c) = \sum_{i\le j} 1- P(A_i) = \sum_{i\le j} (1-1)=0.$$

That is, @@@P(B_j)=1@@@. Clearly @@@B_{j+1} \subseteq B_j@@@, and @@@\cap_{j} B_j@@@ is the event “all patterns appear”. Therefore continuity from above gives that the probability of all patters appearing is a @@@1@@@, or, in plain words, “almost” every infinite sequence will contain all patterns. Note the “almost”. There are uncountably many (like all Heads, @@@HHHH....@@@, or any periodic sequence like @@@HTHTHTHT...@@@, but much more) which do not contain all patterns. What we are saying is that in our model, the combined weight of all these sequences is zero and are therefore very “rare”.

It remains to show that @@@P(A_i)=1@@@. Suppose that the @@@i@@@-th pattern has length @@@k@@@. For every @@@j@@@, let @@@C_j@@@ be the event that the @@@k@@@ tosses from @@@(j-1)k+1@@@-th through the @@@jk@@@-th do not follow the @@@i@@@-th pattern (so they can be any of the remaining @@@2^k-1@@@ patterns). The probability of @@@C_j@@@ is thus @@@2^{-k} (2^k-1)=1-2^{-k}@@@. Therefore by independence, the probability of @@@\cap_{j=1}^n C_j@@@ is @@@(1-2^{-k})^n@@@. Since these intersections are obviously decreasing as @@@n\to\infty@@@ to the intersection @@@\cap_{j=1}^\infty C_j@@@, it follows from the corollary that @@@P(\cap_{j=1}^\infty C_j)=0@@@. Or, the the complement @@@\cup_{j=1}^\infty C_j^c@@@ has probability @@@1@@@. But this complement is the event “the @@@i@@@-th pattern appears beginning from the @@@(j-1)k+1@@@-th toss, for some positive integer @@@j@@@”. This is clearly contained in the event @@@A_i@@@ (“the @@@i@@@-th pattern appears somehwere”, no additional requirement where it begins). Therefore @@@P(A_i)=1@@@.

Exercise 13.

Let’s continue Example Example 25. Show that the probability of every finite pattern appearing infinitely many times (aka infinitely often) is @@@1@@@.

We tasted some manipulation of infinite events, and we saw how in this infinite setting we may naturally encounter events of probability zero which are far from being empty of finite or even countably infinite… We also see how to identify complex statements on the entire sequence (“every pattern appears”) as events through the allowed operations in a sigma-algebra (though we did not provide a complete argument, there’s a missing detail that I’d like to ask you to try to find and complete. It’s in the paragraph above).

Problems

Problem 1.

I have a sample space consisting of all eight sequences of length 3 made from the symbols H and T. I am told that the there exists a constant c>0 such that the probability of a each sequence is equal to c times the square of the number of H. What is the probability of at least two H?

Problem 2.

I’m rolling a fair die twice.

  1. What is the probability that both tosses give the same number?
  2. Use the first part to find the probability that the number in the first toss is larger than the number on the second toss.
Problem 3.

The Chinese zodiac is a 12-year cycle. What is the probability that in a family of four at least two family members will be born in same years in the cycle?

Problem 4.

Back to the birthday problem . Suppose we have a group of @@@n@@@ people. Write down the formula for the probability that there will be exactly one pair of people with the same birthday (assume a year has @@@365@@@ days).

Problem 5.

My teacher selected @@@20@@@ words for the quiz out of a vocab of @@@40@@@ words. I decided I’ll only study @@@30@@@ words.

  1. What is the probability I will study exactly @@@16@@@ words that will appear in the quiz? Give your answer in a formula form and don’t try to get a numerical answer.
  2. What is the probability I will study at least @@@16@@@ of the words that will appear in the quiz? Give your answer in a formula form.
Problem 6.

In the math department’s annual Ping-Pong tournament there are @@@20@@@ contestants: @@@10@@@ grad students and @@@10@@@ faculty. In the first round we randomly group them in pairs. What is the probability that in this round each pair will have one graduate student and one faculty member?

Problem 7.

I’m selecting two spaces in a square @@@3\times 3@@@ grid.

  1. What is the probability that they are in the same row?
  2. What is the probability that they are in the same row or in the same column?
Problem 8.

The probability of @@@A\cup B@@@ is @@@0.6@@@, and the probability of @@@A\cup B^c@@@ is @@@0.7@@@. What is the probability of @@@A@@@?

Problem 9.

I have a round table with radius of @@@1@@@ foot. I dropped a tiny pin on the table. For any subset @@@A@@@ of the table, the probability that the pin falls in @@@A@@@ is given by the area of @@@A@@@ (in square feet) divided by @@@\pi@@@.

  1. What is the probability that the pin is at most @@@1/2@@@ a foot from the center of the table?
  2. What is the probability that the ball of radius one inch around the pin is fully contained in the table?
Problem 10.

I’m shuffling a deck with @@@6@@@ cards, numbered @@@1,2,\dots,6@@@, and then revealing the cards one by one. What is the probability that there will be an increasing sequence of at least @@@3@@@ consecutive cards?

For example, if the ordering of the cards is @@@6,5,1,3,4,2@@@, then I have one increasing sequence of length @@@3@@@, starting from the third card. If the ordering is @@@4,5,2,3,1,6@@@, there is no increasing sequence of three cards.

Problem 11.

A lesson about politics ?

Three political candidates.

  • One, an extremely solid candidate with a fixed base of support will get between @@@21\%@@@ and @@@24\%@@@ of the votes.
  • Second, also relatively solid, gets @@@20\%@@@ of the votes or @@@30\%@@@, depending on the day (according to whether it’s a rainy day or not), each with probability @@@1/2@@@.
  • Third, a firebrand, will get below @@@5\%@@@ or between @@@32\%@@@ and @@@38\%@@@ (according to whether saying something controversial or makes promises on the economy), with probability @@@1/2@@@ each, independently of the second.
  1. Suppose that exactly two of the candidates compete in the election. For each possible pair, what is the probability for each candidate to win?
  2. Now suppose all three compete. What is the probability for each candidate to win?
Problem 12.

Suppose that our experiment involves repeatedly tossing a coin. The sample space is all infinite sequences formed by Heads and Tails. Suppose that for every @@@n@@@, the set $n$-th toss is Heads is an event. Show that all of the following are also events:

  1. “Heads will appear infinitely many times”
  2. “The proportion of Heads among the first @@@n@@@ tosses will exceed @@@2/3@@@ only finitely many times”.
  3. “For every @@@n@@@, there will be a run (consecutive sequence) of Heads of length larger than @@@n@@@”