Back of the Envelope

Observations on the Theory and Empirics of Mathematical Finance

Archive for the ‘Teaching: PDS’ Category

Black-Scholes PDE – I: 1st (Original) Derivation

with 2 comments

The original CAPM-based derivation of the Black-Scholes PDE

Ingredients required:

  • Ito’s Lemma: Given stochastic process for the stock price dS = \mu S dt + \sigma S dX, Ito’s lemma gives stochastic process for a derivative F(t, S) as:

\displaystyle dF = \Big( \frac{\partial F}{\partial t} + \mu S \frac{\partial F}{\partial S} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2} \Big) dt + \sigma S \frac{\partial F}{\partial S} dX(t)

  • CAPM: Expected return from a stock is sum of the reward for waiting (the risk-free rate r), and reward for bearing risk over and above the risk free rate (\beta_s (E[r_M] - r_f)), i.e.:

E[r_S] = r + \beta_S (E[r_M] - r)

Given CAPM, the instantaneous return r_S dt on the underlying follows:

\begin{aligned} \displaystyle E[r_S dt] = E[\frac{dS}{S}] &= r dt + \beta_S (E[r_M] - r) dt \\ \Rightarrow E[dS] &= rS dt + \beta_S (E[r_M] - r) S dt \end{aligned}

And, similarly, the instantaneous return r_F dt on the derivative follows:

\begin{aligned} \displaystyle E[r_F dt] = E[\frac{dF}{F}] &= r dt + \beta_F (E[r_M] - r) dt \\ \Rightarrow E[dF] &= rF dt + \beta_F (E[r_M] - r) F dt \end{aligned}

Re-writing Ito’s Lemma in terms of dS and dividing by F gives:

\displaystyle \frac{dF}{F} = \frac{1}{F} \Big( \frac{\partial F}{\partial t} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2} \Big) dt + \frac{1}{F} \frac{\partial F}{\partial S} dS

Dividing and multiplying by S in the last term, and writing \frac{dS}{S} and \frac{dF}{F} respectively as r_S dt and r_F dt implies:

\begin{aligned} r_F dt &=\frac{1}{F}\Big( \frac{\partial F}{\partial t} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2} \Big) dt + \frac{\partial F}{\partial S} \frac{S}{F} r_S dt\end{aligned}

Canceling dt on both sides and noting that the only random term on the RHS is r_S, plus the fact that for any three random variables x, y and z, y = a + bx implies \mbox{Cov}(y, z) = b \mbox{Cov}(x, z) allows us to write:

\displaystyle \mbox{Cov}(r_F, r_M) = \frac{\partial F}{\partial S} \frac{S}{F} \mbox{Cov}(r_S, r_M)

Finally, dividing both sides by variance of market returns \sigma_M^2 gives the following relationship between the option beta and the stock beta:

\displaystyle \beta_F = \frac{\partial F}{\partial S} \frac{S}{F} \beta_S

Coming back to Ito’s Lemma, we can take expectation on both sides of the expression for dF to write:

\displaystyle E[dF] = \Big( \frac{\partial F}{\partial t} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2} \Big) dt + \frac{\partial F}{\partial S} E[dS]

Using CAPM expressions for E[dF] and E[dS] in the above gives:

\displaystyle rF dt + (r_M - r) \beta_F F dt = \Big( \frac{\partial F}{\partial t} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2} \Big) dt + \frac{\partial F}{\partial S} \big(rS dt + (r_M - r) \beta_S S dt \big)

The last step now is substituting expression for \beta_F in terms of \beta_S, and cancelling terms to show that:

\displaystyle \frac{\partial F}{\partial t} + rS\frac{\partial F}{\partial S} + \frac{1}{2}\sigma^2 S^2\frac{\partial^2 F}{\partial S^2}= rF

which is the Black-Scholes PDE.


Written by Vineet

July 12, 2014 at 2:24 pm

[PDS] Probability in Finance – Key Ideas: IV

leave a comment »

Having defined random variables using the measure-theoretic language, to complete the basic set-up, we can now define familiar things like ‘expectation/expected value’ and ‘variance’ of a random variable. Expected values are understood as weighted averages, or simply sums or integrals, and in the world of probability, it turns out we need a specific kind of integral, called the Lebesgue integral.

Measurable Functions can be Integrated

Like Riemann integrals, the intuitive way to understand Lebesgue integrals is to think of them as ‘area under a curve’. The way Lebesgue integral differs from its Riemann counterpart that it calculates area by dividing along the range of the function. Recall that Riemann integral works by taking limits of the ‘lower sum’ and the ‘upper sum’, where the lower and upper sums are calculated as sum of the area of rectangles formed by considering intervals along the domain (the x-axis). The following pictures borrowed from shows the difference:

riemann-lebesgue-sums[Source: Steven Shreve, Stochastic Calculus in Finance, Vol II, Chapter 1; Click to zoom]

Extending the intuition from the Riemann integral then allows us to write area under the curve taking intervals along the range (y-axis) as:

\mbox{Lower Lebesgue Sum}= \displaystyle\sum_{n = 1}^{N} c_n m(f^{-1}(I_n))

where c_n = f(x) for some x \in I_n. Note that the above Lebesgue sum is defined iff one can talk about m(f^{-1}(I_n)) meaningfully – that is one can ‘measure’ (m) the inverse image of the function (f^{-1}(I_n)).

More formally, then, one can write the area under the curve in the Lebesgue sense iff inverse image of f is measurable. It is in this sense that Lebesgue integrals are defined.

The need for Lebesgue integral arises when finding things like ‘expectation’ and ‘variance’. Finding expectation or expected value involves summing over values a random variable takes weighted by probability. Now recall that probability is defined for events in the sample space, but random variables are function defined on sample space. So find this sum is like integration of a function, i.e. values taken by the random variable (y-axis) over probabilities (measure) defined on events in the sample space, i.e. \sigma -field (x-axis).

So the requirement that measurable functions is a natural requirement when talking about random variables. We can find probabilities (measure) of only those value of the random variable which \textit{can} happen, i.e. belong to the \sigma-field generated by the sample space. Also, note that argued this way it is clear (why?) that there is no obvious way to partition the x-axis ala Riemann (probabilities of events in the sample space corresponding to the values taken by the random variable), and the only way one can integrate random variables is by starting on the y-axis (values taken by the random variable).

A formal definition of Lebesgue integral is more than what we need at this stage, so with the intuition in place we can now move to defining expected value.

Expected Value as Lebesgue Integrals

Expected Value: Given a random variable X on the probability space (\Omega, \mathbb{F}, \mathbb{P}) the expected value is defined as:

E[X] = \displaystyle\int_{\Omega} X d\mathbb{P}

and it can be shown that it is equivalent to our familiar notion:

E[X] =\displaystyle\int_{\mathbb{R}} X d\mathbb{P}_X

and if X is continuous this changes to the familiar formula:

E[X] =\displaystyle\int_{\mathbb{R}} x f(x) dx

where \mathbb{P}_X is the probability distribution and f(x) is the probability density function associated with the random variable X.

At this stage a natural question is how do we compute Lebesgue integrals in practice. Well, as it turns out for most ‘nice’ and ‘well-defined’ functions, value of a Lebesgue integral is same as that obtained by finding he integral the Riemann way (relieved?). So for most practical purposes nothing needs to change as far as our intuitive notion of expected value is concerned.

Written by Vineet

March 7, 2013 at 1:52 pm

[PDS] Probability in Finance – Key Ideas: III

with one comment

When we do elementary probability, one of the most common set-up used is the coin-toss game with the outcomes being \{H\} or \{T\}. While it remains one of the most useful thought experiments to think systematically about chance, with the abstract outcomes as “\{H\}” and “\{T\}“, there is not much one can do with that.

For example, if one were to toss the coin many times it would be good to get a sense of “expected outcome” and “variations” in outcome from the coin-toss game. But with the abstract sample space such as \{H\} and \{T\}, it is not possible to do so.

From elementary probability, however, we also know how to get around that. The way is to assign the abstract outcomes \{H\} and \{T\} some numbers. Say, whenever \{H\} comes, assign the number +1 to it, and whenever \{T\} comes assign the number -1 to it. This way, because the abstract outcomes have been converted to numbers, one can now do math with it and find things like “expectations” and “variance” of outcomes. And these are useful things to have, as they help to summarize more complex experiments/models.

Mathematically, one can think of assigning numbers to abstract outcomes as “carrying out a function” – that of mapping abstract outcomes to “real” numbers. It turns out there is a name for this kind of “function”. Mathematicians call it random variable. (Yes, “variable” is perhaps not the the best word for “carrying out a function”, but that’s how it is for historical reasons, and we have to live with it.)

In the world of Lebesgue measure that we have been considering, it turns out random variables in probability are just an example of what are called measurable functions .

Random Variables

If the sample space is known, knowing all possible numbers associated with an experiment (random variables) is equivalent to knowing the \sigma-field, the converse, however, does not hold. That is, while assigning numbers to outcomes of experiments is useful, knowing just the random variables associated with an experiment is not the same as knowing the \sigma-field. Consider the following examples.

Example 1 (Coin-toss): The associated sample space and \sigma-field are respectively \Omega_1 = \{H, T\} and \mathbb{F}_1 = \{\Omega_1, \varnothing, \{H\}, \{T\}\}. Let random variable X_1 assign numbers to outcomes of a single coin toss game such that X_1(H) = 1 and X_1(T) = -1. Knowing the value of the random variable X_1 in this case is enough to tell us about everything about the underlying game.

Example 2 (Die-toss): The associated sample space is \Omega_2 = \{1, 2, 3, 4, 5, 6\} and the associated \sigma-field is \mathbb{F}_2 = \{\Omega_2, \varnothing, \{1\}, \{2\}, \{1, 2\}, \cdots, \{1, 3, 5\}. \{2, 4, 6\}, \cdots \}. Let random variable X_2 assign numbers to outcomes of the die-toss outcome such that if the outcome is odd numbered, the random variable assigns the value 1 to it and -1 otherwise. That is, the random variable X_2 is such that X_2(\{1, 3, 5\}) = 1 and X_2(\{2, 4, 6\}) = -1. Clearly, knowing the value of the random variable X_2 in this case is simply not enough to tell us about the underlying experiment, because there is no way to distinguish between, for example, outcomes \{1\} and \{3\}. The random variable is just too “coarse”.

Not only that, random variables X_1 and X_2 are indistinguishable from each other. If only random variables are reported there is no way to know if the underlying experiment is a coin-toss game or a die-toss game.

That said, in both examples, however, one thing is clear – values of the random variables must correspond to some elements in the \sigma-field. This is the idea behind “measurability” – that random variable values must correspond to “something that can happen” (English-speak for members of the \sigma-field).

Now we are ready to introduce the idea of random variables and measurability more formally.

Random Variables as Lebesgue-Measurable Functions

Measurable Functions: Definition

Given a measurable set A, a function f:A \rightarrow \mathbb{R} is said to be measurable if for any interval I \in \mathbb{R}:

f^{-1}(I) = \{x \in \mathbb{R}: f(x) \in I\} \in \mathbb{M}

That is, a function is measurable if it’s inverse image belongs to the collection of Lebesgue-measurable subsets of \mathbb{R}. Put simply, a measurable (“nice”) function is one which is obtained from a measurable (“nice”) set.

In probability, instead of \mathbb{M}, as mentioned earlier, typically we encounter Borel-measurable sets. So if f^{-1} \in \mathbb{B}, we call f as a Borel-measurable, or simply a Borel function.

Random Variables: Definition

Random variable is a Lebesgue-measurable function X:\Omega \rightarrow \mathbb{R} such that given a probability space \big(\Omega, \mathbb{F}, \mathbb{P} \big), and an interval I \in \mathbb{R}:

X^{-1}(I) = \{x \in \mathbb{R}: f(x) \in I\} \in \mathbb{F}

What this says is that random variable are obtained from sets that belong to the \sigma-field \mathbb{F} – or alternatively, values of a random variable are obtained by assigning numbers to “all possible things that may happen in a game” (English-speak for subsets/elements of \sigma-field \mathbb{F}) – of course, this is simply the act of assigning numbers to abstract outcomes as in the examples above. This definition formalizes this notion.

When there is no confusion about the underlying \sigma-field \mathbb{F}, Lebesgue-measurable functions are simply often referred to as measurable.

\sigma-field Generated by Random Variables

The fact that random variables can often be “coarse” (like assigning only odd/even numbers to outcomes of the die-toss) gives rise to the notion of \sigma-field associated with a random variable.

\sigma-field associated with a random variable is the collection of subsets that can be identified by the random variable. So for the random variable X_2 described above, \sigma-field generated by X_2 would be \sigma(X_2) = \{\Omega, \varnothing, \{1, 3, 5\}, \{2, 4, 6\}\} – that is, the random variable can only identify outcomes upto whether they are odd or even numbers.

We can make this idea more formal by technically defining the notion of \sigma-field generated by a random variable.

\sigma-field Generated by a Random Variable: Definition

Given a probability space (\Omega, \mathbb{F}, \mathbb{P}) and a random variable X:\Omega \rightarrow \mathbb{R}, the family of sets such that for some A \in \mathbb{B}:

X^{-1}(\mathbb{B})=\{X^{-1}(A) \subseteq \mathbb{F}\} \subseteq \mathbb{F}

is a \sigma-field; where \mathbb{B} is a Borel field.

In English-speak \sigma- field generated by a random variable X is the smallest subset of \mathbb{F} (i.e. the smallest \sigma-field) that describes the random variable.

The last piece of formalization we need now is to describe systematically the probabilities associated with different values of the random variable.

Probability Distribution

Probability distribution of a random variable is the probability of elements in the \sigma-field generated by the random variable. (Remember that probabilities are assigned to events and not directly to random variables. So, probability of random variable taking some value or lying in a certain interval must correspond to some events in the \sigma-field.)

Consider the random variable X_2 in our example above. The \sigma-field generated by it is \sigma(X_2) = \{\Omega, \varnothing, \{1, 3, 5\}, \{2, 4, 6\}\}, and the probability associated with those are \mathbb{P}(\{1, 3, 5\}) = \mathbb{P}(\mathbb{X}_2^{-1}(1)), i.e. probability when the random variable takes the value 1, \mathbb{P}(\{2, 4, 6\}) = \mathbb{P}(\mathbb{X}_2^{-1}(-1)), \mathbb{P}(\Omega) = \mathbb{P}(\mathbb{X}_2^{-1}(1) \cup \mathbb{X}_2^{-1}(-1)) and \mathbb{P}(\varnothing) = \mathbb{P}(\mathbb{X}_2^{-1}(1) \cap \mathbb{X}_2^{-1}(-1))

It turns out one can summarize the distribution of these probabilities associated with different values taken by the random variable simply as:

\boxed{\mathbb{P}_X = \mathbb{P}(X^{-1}(B))}

where B is any member of the Borel field \mathbb{B}.

For the random variable X_2, then we can use this concise definition to again write the distribution of probabilities associated with values taken by the random variable as:

  • If the Borel set B containes both 1 and -1\mathbb{P}(X_2^{-1}(B)) = \mathbb{P}(\Omega)
  • If the Borel set B containes neither 1 and -1: \mathbb{P}(X_2^{-1}(B)) = \mathbb{P}(\varnothing)
  • If the Borel set B containes only 1 but not -1: \mathbb{P}(X_2^{-1}(B)) = \mathbb{P}(\{1, 3, 5\})
  • If the Borel set B containes only -1 but not 1: \mathbb{P}(X_2^{-1}(B)) = \mathbb{P}(\{2, 4, 6\})

which is what we argued intuitively.

[PS: Definitions above taken from Capinski and Kopp]

Written by Vineet

February 17, 2013 at 1:51 am

[PDS] Feynman-Kac Representation of BSM PDE

leave a comment »

In practice, most people price financial derivatives by Monte Carlo simulation. However, when Black-Scholes-Merton (BSM) gave their famous (or notorious) option pricing formula they came up with that after solving a Partial Differential Equation (PDE). So, just from that point of view, it is not immediately obvious that price obtained via Monte Carlo simulation should be the same as achieved by solving the PDE.

One can, of course, come up with the same result by approaching the option pricing problem from a probabilistic point of view – what is known as the ‘risk-neutral’ method – according to which option price is the discounted expected payoff (which ultimately justifies pricing options by Monte Carlo simulation, in turn relying on the Law of Large Numbers).

So while there are these two very theoretically sound approaches to option pricing, it would be good to know if there is an underlying mathematical connection between the two approaches. It turns out there is, and in fact the result that there is a connection between the two precedes much of the development of option pricing theory.

This result was given by the famous physicist Richard Feynman and probabilist Mark Kac who showed that solution to parabolic (read ‘nice’) PDEs is intimately related to conditional expectations. In what follows we lay out that connection for the specific case of BSM PDE.

Feynman-Kac Representation of BSM PDE

Given the the drift rate \mu and the volatility \sigma, the Geometric Brownian Motion (GBM) for the stock price process S(t) is given by:

dS(t) = \mu S dt + \sigma S dX(t)

where dX(t) represents the increment of a standard Brownian Motion X(t). The above SDE for the stock price process can be said to be in the ‘real world probability measure’.

Then, given a financial derivative, say, a Call Option, C(t, S(t)), Ito’s lemma gives us the Stochastic Differential Equations (SDEs) for C(t, S(t)) as:

\displaystyle dC = \Big( \frac{\partial C}{\partial t}+ \mu S\frac{\partial C}{\partial S} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2}\Big)dt + \sigma S \frac{\partial C}{\partial S} dX(t)

Setting up a hedging portfolio with one unit in C(t, S(t)) and -\Delta units of the stock S with \Delta = \frac{\partial C}{\partial S} gives us Black-Scholes-Merton PDE with the drift \mu replaced by the risk-free rate r, i.e.:

\displaystyle \frac{\partial C}{\partial t}+ r S \frac{\partial C}{\partial S} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2} = rC

After delta hedging has ‘removed’ the risk of the portfolio of one unit in C and \Delta units in stock S, the ‘right’ stock price process to consider is the one in the `risk-neutral’ measure, as:

dS(t) = r S dt + \sigma S dX(t)

(Girsanov theorem implies that the diffusion term in the GBM for stock prices does not change when we move from the `real world’ to a ‘risk-neutral world’.)

Feynman-Kac representation of SDEs tell us that PDEs of the BSM kind have an equivalent probabilistic representation. That is, Feynman-Kac assures that one can solve for the price of the derivative C(t, S(t)) by either discretizing the BSM PDE using Finite Difference methods, or by exploiting the probabilistic interpretation and using Monte Carlo methods.

With this as the backdrop we are now set to write the Feynman-Kac representation for the specific case of BSM PDE.

We start with considering the following functions:

\begin{aligned} Z_1(\tau) &= e^{-r(\tau - t)} \\ Z_2(\tau)&= C(\tau, S(\tau))\end{aligned}

and their differentials:

\begin{aligned} dZ_1(\tau) &= -re^{-r(\tau - t)} d \tau\\ dZ_2(\tau)&= dC(\tau, S(\tau))\end{aligned}

Recall that since in BSM PDE the risk has been `hedged away’, our stock price process S(\tau) is in the risk-neutral world, and that is why the drift term is r (instead of \mu) in the SDE for C.

Next we consider the differential of the product Z_1Z_2, i.e. d(Z_1Z_2):

\begin{aligned} d(Z_1 Z_2) &= Z_2 dZ_1 + Z_1 dZ_2 \\& = -rCe^{-r(\tau - t)} d \tau + e^{-r(\tau - t)} dC\\ &=-rCe^{-r(\tau - t)} d \tau + e^{-r(\tau - t)} \Bigg[ \Big( \frac{\partial C}{\partial \tau}+ rS\frac{\partial C}{\partial S} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2}\Big)d\tau + \sigma S \frac{\partial C}{\partial S} dX(\tau) \Bigg]\end{aligned}

where the last step in the above equation follows directly PDE for C from Ito’s lemma. Now BSM PDE tells us that:

\displaystyle \frac{\partial C}{\partial t}+ r S \frac{\partial C}{\partial S} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2} = rC

With this we can simplify d(Z_1Z_2) as:

\displaystyle \begin{aligned} d(Z_1Z_2) &= -rCe^{-r(\tau - t)} d\tau + e^{-r(\tau - t)}\Bigg[\underbrace{\Big( \frac{\partial C}{\partial \tau}+rS\frac{\partial C}{\partial S}+\frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2}\Big)}_{rC} d\tau + \sigma S \frac{\partial C}{\partial S} dX(\tau) \Bigg] \\&= -rCe^{-r(\tau - t)} d\tau+e^{-r(\tau - t)} \Bigg[rC d\tau + \sigma S \frac{\partial C}{\partial S} dX(\tau) \Bigg] \\&= -rCe^{-r(\tau - t)} d\tau + rCe^{-r(\tau - t)} d\tau + e^{-r(\tau - t)}\sigma S \frac{\partial C}{\partial S} dX(\tau) \\&=e^{-r(\tau - t)}\sigma S \frac{\partial C}{\partial S} dX(\tau)\end{aligned}

That is, the change in the function Z_1Z_2 is a driftless SDE. We integrate both sides to give:

\displaystyle \int_{t}^{T}{d(Z_1Z_2)} =\int_{t}^{T}{e^{-r(\tau - t)}\sigma S \frac{\partial C}{\partial S} dX(\tau)}

Then taking expectations of both sides w.r.t the filtration F_t at time t and using the fact that stochastic integrals are martingales (RHS of the equation below) gives:

\displaystyle \begin{aligned} E\big[\int_{t}^{T}{d(Z_1Z_2)}\big\rvert F_t \big] &= E\big[\int_{t}^{T}{e^{-r(\tau - t)}\sigma S \frac{\partial C}{\partial S} dX(\tau)}\big\rvert F_t \big] \\ E \big[Z_1(T)Z_2(T, S(T)) - Z_1(t)Z_2(t, S(t)) \big\rvert F_t\big] &= 0 \\ \mbox{or } Z_1(t)Z_2(t, S(t)) &=E\big[Z_1(T)Z_2(T, S(T)) \big\rvert F_t\big] \end{aligned}

Substituting back the value of original expressions for Z_1 and Z_2 at time t and T, i.e. Z_1(t) = e^{-r(t - t)} and Z_2(t) = C(t, S(t)) gives:

\begin{aligned} e^{-r(t - t)}C(t, S(t)) &= E\big[e^{-r(T - t)})C(T, S(T))\big\rvert F_t\big] \\ \Rightarrow C(t, S(t)) &= E\big[e^{-r(T - t)})C(T, S(T))\big\rvert F_t\big] \end{aligned}

and we are done!

That is, BSM PDE implies that the price of the derivative C(t, S(t)) at time t is equivalent to the discounted value of the expected payoff at expiration (time T). This is the famous Feynman-Kac representation.

And this is why pricing derivatives via Finite Difference methods (by discretizing the PDE) is mathematically equivalent to pricing them using  Monte Carlo methods (taking expectations).

Written by Vineet

February 15, 2013 at 9:42 pm

[PDS] Probability in Finance: Key Ideas – II

with one comment

\sigma-field on \Omega

Just like the collection of Lebesgue-measurable sets \mathbb{M} represented ‘nice’ subsets of \mathbb{R}, for a general sample space \Omega (as in probability ‘experiments’), one can also think of collection of ‘nice’ subsets of \Omega in a similar vein.

The analogue of \mathbb{M} on \mathbb{R} is \mathbb{F} on \Omega. \mathbb{F} has the same properties as \mathbb{M}, i.e. it is closed under ‘complements’ and ‘finite unions’ as:

1. \Omega \in \mathbb{F}

2. A \in \mathbb{F} \Rightarrow A^c \in \mathbb{F}

3. A_i \in \mathbb{F} \forall i \ge 1 \displaystyle \Rightarrow \bigcup_{i=1}^{\infty}A_i \in \mathbb{F}

and defines what is called a \sigma-field on \Omega.

A \sigma-field need not comprise all subsets of \Omega – it could be even for a subset of \Omega as long as it satisfies all the above properties.

For example in a ‘die toss’ experiment with sample space \Omega = \{1, 2, 3, 4, 5, 6\}, \mathbb{F_A} = \{\Omega, \varnothing, \{1, 2\}, \{3, 4, 5, 6\}\} is an example of a \sigma-field generated by subsets \mathbb{A} = \{\{1, 2\}, \{3, 4, 5, 6\}\}.

Needless to say, one can come up with other \sigma-fields on \Omega which are ‘larger’ than \mathbb{F_A}. For example, consider the collection \{\{1, 2\}, \{3, 4\}, \{5, 6\}\} and the \sigma-field generated by that subset. Clearly it would be a larger \sigma-field than \mathbb{F_A} because it would not only contain the elements contained in \mathbb{F_A} but also some more.

There is a nice result pertaining to \sigma-fields which says that for any given collection of subsets \mathbb{A} of \mathbb{F}, there exists a smallest \sigma-field that contains \mathbb{A}. For example, \mathbb{F_A} above describes the smallest \sigma-field containing \mathbb{A} =\{\{1, 2\}, \{3, 4, 5, 6\}\}.

The smallest \sigma-field containing \mathbb{A} is then referred to as the \sigma-field generated by \mathbb{A}.

Borel Field and Borel Sets

Although \mathbb{M} is a collection of ‘nice’ subsets of \mathbb{R}, it turns it is often still too large for our purpose (of measuring probabilities). What is often required is not \mathbb{M}, but some nice \sigma-field like \mathbb{M}, but perhaps smaller than \mathbb{M}, that contains all intervals (closed, open, semi-open/semi-closed – all kinds).

As pointed out earlier, \mathbb{M} contains all intervals and also all null sets. If we apply the result mentioned above that given a collection of subsets \mathbb{A} (all intervals), then we know that there must exist a smallest \sigma-field containing all intervals. Since all intervals \mathbb{A} are part of \mathbb{M}, if that \sigma-field exists, it is ‘included’ in \mathbb{M}.

Indeed, such a \sigma-field exists, which is the smallest \sigma-field containing all intervals, called a Borel field \mathbb{B}. The elements B of \mathbb{B} are called Borel sets.

For most purposes in probability, the Borel \sigma-field \mathbb{B}, it turns out, is good enough. So, we may get by defining measures on this ‘smaller’ \sigma-field instead of \mathbb{M}.

Restricting Lebesge Measure

We have so far defined measures of the kind m:\mathbb{M} \rightarrow [0, \infty), but we know from our intuitive understanding of probability that a probability measure must lie between [0, 1]. So the last piece of machinery we need is something that allows us to ‘restrict’ Lebesgue measure m to any Lebesgue-measurable subset B \in \mathbb{R}.

Given the measure space \big(\mathbb{R}, \mathbb{M}, m\big), the following construction ‘restricts’ Lebesgue measure to a Lebesgue-measurable subset B \in \mathbb{R}:

\mathbb{M}_B = \{A \cap B: A \in \mathbb{M} \}

such that \forall C \in \mathbb{M}_B

m_B(C) = m(C)

The triple \big(B, \mathbb{M}_B, m_B\big) is then a (complete) measure space.

Probability Space

The fact that the above ‘restriction’ of Lebesgue measure results in a measure space now allows us to define probability measure over arbitrary spaces without worrying about if we can ‘restrict’ that measure to [0, 1].

Probability Space: Definition

A probability space is a triple \big(\Omega, \mathbb{F}, \mathbb{P} \big), where \Omega is an arbitrary set (‘sample space’), \mathbb{F} is a \sigma-field of subsets of \Omega (i.e. elements of \mathbb{F} are all possible ‘events’), and \mathbb{P} is a measure on \mathbb{F}:

\mathbb{P}(\Omega) = 1

called probability measure or simply probability.

(Definitions in this post, as in the previous one, are taken from Capinski and Kopp.)

With an abstract measure space, one can always assign the measure to lie between [0, 1] depending on the nature of the experiment.

In the case when \Omega is a Lebesgue-measurable subset of \mathbb{R} where the measures/lengths may indeed be larger than 1. But the fact that one can ‘restrict’ measures to Lebesgue-measurable subsets of \mathbb{R} affords us a way out. This is done by writing probability for any subset B \in \mathbb{M}_{\Omega} as:

\mathbb{P}(B) = \displaystyle \frac{1}{m(\Omega)}m(B)

where \mathbb{M}_{\Omega} = \{A \cap \Omega: A \in \mathbb{M} \}

The measure \mathbb{P} as defined above is a restriction of m:\mathbb{M} \rightarrow \left[0, \infty\right) to \mathbb{P}:\mathbb{M}_{\Omega} \rightarrow [0, 1] and we are guaranteed that \big(\Omega, \mathbb{M}_{\Omega}, \mathbb{P} \big) is a measure/probability space.

Note that as defined above, probability measure \mathbb{P} need not have any physical meaning attached to it. But the construction above ensures that it can handle all kinds of events and sample spaces that we may encounter when dealing with arbitrary (and often infinite) sample spaces.

In the following we take a look at the idea of Lebesgue-measurable functions which will take us to the important notion of random variables.

Written by Vineet

February 14, 2013 at 2:24 am

Posted in Teaching: PDS

Tagged with , ,

[PDS] Probability in Finance: Key Ideas – I

with 2 comments

Need for a Mathematical Theory of Probability

One of the reasons we need a mathematical (read measure-theoretic) foundations of probability is that when dealing with infinite sample spaces (choosing a number at random in [0, 1] for example), there is no immediately obvious way of assigning probabilities to ‘not very likely events’. Let me elaborate.

While it is not difficult to understand that when selecting a number at random from [0, 1], the probability that it lies between [0, 0.5] would be 0.5, it is not immediately obvious what would be the probability that the number selected would be one of \{0.001, 0.002, ... , 0.999\} or, say,  a rational number (\mathbb{Q}).

Both the sets \{0.001, 0.002, ... , 0.999\} and \mathbb{Q} are countable and, while seemingly big, are yet ‘too small’ compared to all the points in [0, 1]. Also, the set of rational numbers \mathbb{Q} is not is not even an interval in the way, say, [0, 0.5] is.

The fact that there are subsets on the real line \mathbb{R} which are not intervals or not ‘nice’ subsets (e.g. the Cantor Set) means that the notion of length as distance between two points is not enough. We need a ‘better scale’ if-you-will that allows us to measure the length of sets like \mathbb{Q} and identify the ‘smallness’ of countable sets.

This ‘better scale’ that we are looking for is known as the Lebesgue Measure. (Of course, there are many other advantages of using the idea of measure than to just assign probabilities, but that needn’t concern us for now.)

But before we lay out the properties of this scale, we need some new machinery.

Null Sets

Just like the development of numbers begins with defining the number 0, development of a theory of measure begins with defining sets which are negligible, or alternatively, null sets.

Null sets are those which have a measure 0 according to our new scale. Knowing sets that have a measure 0 then allows us to identify sets that have a ‘finite length’.

Null Sets: Definition

A set A \subseteq \mathbb{R} is null if it can be covered by a sequence of intervals \{I_n: n \ge 1\} of near-zero total length, i.e. A is null if given an arbitrary small \epsilon > 0:

A \subseteq \displaystyle\bigcup_{i=1}^{\infty} I_n

\displaystyle\sum_{i=1}^{\infty} l(I_n) < \epsilon

The definition says that an arbitrary set is null if the total length of sets which cover it is ‘very small’. This implies that any countable set A is null as:

A = \{1, 2, \cdots N\} \subseteq [1, 1] \cup [2, 2] \cdots \cup [N, N] = \displaystyle\bigcup_{i=1}^{N} [i, i]

\displaystyle\sum_{i=1}^{\infty} l([k, k]) = 0< \epsilon

Since length of the closed interval is l([k, k]) = 0, the length of set A must also be zero (it is a countable sum of zero-length closed intervals). This suggests that given the way we have defined null sets, all countable sets, including the set of rational numbers \mathbb{Q}, are null sets – that is have a measure 0.

Outer Measure

The notion of covering/approximating a set by a sequence of intervals turns out to be an important and very useful step in constructing a theory of measures. Continuing with the notion of covers, first we define what is called the Outer Measure.

Outer Measure: Definition

 The Outer Measure of a set A \subseteq \mathbb{R} is denoted by m^*(A) and is given by:

m^*(A) = \inf \{\displaystyle\sum_{i=1}^{\infty} l(I_n): A \subseteq \displaystyle\bigcup_{i=1}^{\infty} I_n\}

Intuitively this definition says the length (Outer Measure) of a set is the smallest total length of all intervals that cover the set.

The reason for developing a ‘new scale’ (measure) was so that we could measure all kinds of arbitrary subsets on \mathbb{R}. But having done so the least we would expect is that for intervals measure gives us the same answer as its ‘length’. Indeed – all the expected/intuitive properties of length are preserved by Outer Measure. Below we list its important properties (taken directly from Capinski and Kopp, including the notation):

Outer Measure: Properties

1. A \subseteq \mathbb{R} is null iff m^*(A) =0

2. A, B \subseteq \mathbb{R} and A \subset B \Rightarrow m^*(A) \le m^*(B)

3. Outer Measure of an interval equals its length, i.e. m^*([a, b]) = b - a

4. Outer Measure is countably sub-additive, i.e. for a sequence of sets \{A_n: i \ge 1\} (not necessarily disjoint)

\displaystyle m^*\Big( \bigcup_{n=1}^{\infty} A_n \Big) \le \sum_{n=1}^{\infty} m^*(A_n)

While this is great, the fourth property of Outer Measure gives us some cause for concern. The problem is that it doesn’t guarantee that measure of a union of disjoint sets does not necessarily add-up to the sum of measure of individual sets. And what do we mean by that?

Consider two intervals A = [a, b] with measure b - a and B=\left(b, c \right] with measure c - b which are disjoint, then we should expect that length of the set A \cup B = [a, c] should have a length c - a=(b-a)+(c-b). While for this example value of Outer Measure respects our intuition of length, in general this additivity property is not true for Outer Measure for all subsets of \mathbb{R}. And we do not like that!

Ok, admittedly, this is only the case for really nasty sets, but for now (and in probability) we do not want to work with such nasty sets. So we are looking for those subsets of \mathbb{R} for which the Outer Measure is additive for disjoint sets, i.e. we only want to work with those subsets of \mathbb{R} for which if A_i \cap A_j = \varnothing \forall i, j then:

\displaystyle m^*\Big( \bigcup_{n=1}^{\infty} A_n \Big)=\sum_{n=1}^{\infty} m^*(A_n)

Collection of all subsets of \mathbb{R} for which this additivty property holds (and all other properties of Outer Measure) are called Lebesgue-Measurable Sets and is denoted by \mathbb{M}.

Lebesgue Measure

The collection of subsets \mathbb{M} is important enough to warrant a different notion of measure for which the additivty property holds. Outer Measure with additivity property for sets in \mathbb{M} is called the Lebesgue Measure on \mathbb{R}. We use the notation m for Lebesgue measures and write:

\displaystyle m\Big( \bigcup_{n=1}^{\infty} A_n \Big)=\sum_{n=1}^{\infty} m(A_n)

Technical definition of Lebesgue Measure is more than what we need at this stage and can be found, for example, in Capinski and Kopp. For us, thinking of Lebesgue measure as an Outer Measure with additivity property is about enough. Needless to say, all the other properties of Outer Measure carry through to the Lebesgue Measure.

Lebesgue Measurable Sets

The collection \mathbb{M} is clearly a ‘nice’ subset of \mathbb{R}. Other than the additivity property of the Lebesgue measure for subsets of \mathbb{M}, its subsets have some other ‘nice’ notable properties:

1. \mathbb{R} \in \mathbb{M}

2. A \in \mathbb{M} \Rightarrow A^c \in \mathbb{M}

3. A_i \in \mathbb{M} \forall i \ge 1 \displaystyle \Rightarrow \cup_{i=1}^{\infty}A_i \in \mathbb{M}

The above properties just say that when one does simple operations (taking ‘complements’ and ‘unions’) on sets belonging to \mathbb{M} we remain in \mathbb{M}, i.e. doing simple operations on sets in \mathbb{M} does not take us ‘out of’ \mathbb{M} – we remain in the ‘nice’ world of \mathbb{M}.

(Those familiar with the notion of \sigma-fields would recognize that \mathbb{M} forms a \sigma-field on \mathbb{R}. )

Measure Space

We began with the set \mathbb{R}. We considered ‘nice’ subsets of \mathbb{R} which were Lebesgue measurable (in the sense above) and called the collection of all such subsets as the Lebesgue-measurable sets \mathbb{M}. So we have three things now: the underlying set \mathbb{R}, collection of Lebesgue-measurable subsets \mathbb{M}, and the Lebesgue measure m. For the sake of brevity we often call this ‘triple’:

\big(\mathbb{R}, \mathbb{M}, m\big)

as the measure space.

At this stage it is also useful to explicitly identify the Lebesgue measure m as a ‘scale’ that assigns each subset in \mathbb{M} a ‘length’ (measure). Given our understanding of a function, then this is what a Lebesgue measure is. m is a function that assigns to each subset in \mathbb{M} a number between 0 (for null sets) and \infty (for uncountable sets like \mathbb{R}), i.e.:

m:\mathbb{M} \rightarrow \left[0, \infty\right)

Next we extend the idea of measure space to abstract spaces replacing \mathbb{R} by \Omega, \mathbb{M} by \mathbb{F} (\sigma-field on \Omega) and the Lebesgue measure m by a measure \mathbb{P} called the probability measure. The resulting space (\Omega,\mathbb{F}, \mathbb{P}) is then called a probability space.

[PS: Much of the discussion in this post summarizes the treatment of measure-theoretic ideas as in Capinski and Kopp]

Written by Vineet

February 13, 2013 at 11:30 pm

Posted in Teaching: PDS

Tagged with , ,

[PDS] Chain Rule in Ito Calculus

leave a comment »

Chain Rule in Ito Calculus

Given two stochastic processes w_1(t) and w_2(t) driven by different Brownian Motions X_1(t) and X_2(t) as

\begin{aligned} w_1(t) &= \int{g_1(\tau) dX_1(\tau)}\\ w_2(t) &= \int{g_2(\tau) dX_2(\tau)}\end{aligned}

or alternatively, writing them in ‘short-hand’ (their SDE form) as:

\begin{aligned} dw_1(t) &= g_1(t) dX_1(t) \\ dw_2(t) &= g_2(t) dX_2(t)\end{aligned}

Then Ito’s lemma in 2-D tells us that function f(w_1(t), w_2(t)) = w_1(t)w_2(t) will satisfy:

\displaystyle \begin{aligned} d f(w_1(t),w_2(t)) &= \frac{\partial f}{\partial w_1}dw_1(t)+\frac{\partial f}{\partial w_2}dw_2(t)+\frac{\partial^2 f}{\partial w_1^2} dw_1^2(t) \\& \hspace{6pc} +\frac{\partial^2 f}{\partial w_2^2}dw_2^2(t)+ \frac{\partial^2 f}{\partial w_1\partial w_2}dw_1(t)dw_2(t) \end{aligned}

Given f(w_1(t), w_2(t)) = w_1(t)w_2(t), the following will hold:

\begin{aligned} \frac{\partial^2 f}{\partial w_1^2} &=0 \\ \frac{\partial^2 f}{\partial w_2^2}&= 0\\ \frac{\partial^2 f}{\partial w_1\partial w_2}&= 1\end{aligned}

With this we can now simplify the expression for df(w_1(t), w_2(t)) as:

\boxed{d f(w_1(t),w_2(t)) = w_2(t)dw_1(t)+ w_1(t)dw_2(t)+ dw_1(t)dw_2(t)}

This describes the Chain Rule in Ito calculus.

We can, of course, further simplify the above and write:

\begin{aligned} df(w_1(t),w_2(t)) &= w_2(t)dw_1(t)+ w_1(t)dw_2(t)+ dw_1(t)dw_2(t) \\&= w_2(t)g_1(t)dX_1(t) + w_1(t)g_2(t)dX_2(t) + g_1(t)g_2(t) dX_1(t)dX_2(t) \\&= w_2(t)g_1(t)dX_1(t) + w_1(t)g_2(t)dX_2(t) + \rho g_1(t)g_2(t)dt\end{aligned}

where \rho is the correlation between the two Brownian Motions X_1(t) and X_2(t).

Written by Vineet

February 11, 2013 at 11:48 pm

Posted in Teaching: PDS

Tagged with , ,