## Archive for the ‘**Math: Useful**’ Category

## [Stats] Normalizing to a benchmark by matching moments

One often needs to compare data across samples, say, when one or more samples constitute a benchmark. For example, consider comparing interview scores across panels when one of the panels is considered to be the benchmark.

The objective is to transform the data of a sample so as to match its mean ) and standard deviation to that of the benchmark sample. That is, we begin with:

and we need to transform data as such that and , where and respectively represent the mean and standard deviation of the benchmark sample.

The moment matching approach then requires finding parameters and such that:

Simplifying the second equation implies we require that:

And accordingly,

That is, transforming as:

ensures that has the same mean and standard deviation as the benchmark data. Needless to say that this will work best for situations where the underlying data can be safely assumed to be coming from the Normal distribution or any of its kin (why?).

Looking at the transformation, upon a moment’s reflection it almost appears *obvious, *doesn’t it?* *To the point that this whole post seems moot. So, why bother? Well, while this is a simple enough example, the method of moments is important enough in economics and finance for its generalized version to be granted a Nobel status a couple of years ago.

## [Stats] Results involving correlation

I have noticed that many of our students are not quite aware of mathematical proofs behind some very basic identities involving correlation (standard books on probability – including *that* *one* – of course, have the necessary proofs, but old-fashioned textbooks aren’t exactly all that popular, are they?).

I have in mind the three:

- Correlation coefficient always lies between minus and plus 1
- Perfect correlation between two variables implies a linear relationship between them
- The relationship between in a simple linear regression and the correlation coefficient

**1. Proof that **

(This pretty much follows Feller, Vol. 1, Chapter 9)

Normalize the random variables and to have mean 0 and standard deviation 1, as:

Covariance, , between the standardized random variables and is:

i.e. the covariance between the standardized random variables represents the correlation between the original random variables(the second term in the second equation above is zero because and ).

The trick now is to calculate the variance of the sum of the standardized random variables:

Since variance is always non-negative, we must have that or .

**2. A special case: **

When , the last equation reduces to , meaning = constant. Converting back to the original variables, this implies:

i.e. when the two random variables are perfectly positively or negatively correlated, they can be written as linear functions of each other. Alternatively, correlation captures a linear relationship between two random variables.

**3. **

Regression coefficient in a standard linear regression is given by:

with the given as:

The correlation coefficient between and is given by:

Noting that , it is a matter of simple algebra to verify that:

Putting the two together, and using the formula for the regression coefficient, , we have that:

(where in the second step above the term cancels off in numerator and denominator, simplifying the expression)

Finally, dividing both sides by gives the definition of on the LHS and on the RHS, i.e.:

(This result also has a more concise and general proof, but uses linear algebra. See, for example, here.)

## Simulating Correlated Stochastic Differential Equations (or How to Simulate Heston Stochastic Volatility Model)

I notice that students new to computational finance often make mistakes in simulating correlated Brownian motion paths. Here is a ready reckoner.

Let’s take the example of generating paths for asset prices using the Heston stochastic volatility model:

where is the instantaneous variance of asset returns, and the increment in Brownian motions and are correlated with correlation coefficient , i.e. .

The simplest way to generate paths and is to use the Euler discretization (there are better methods available of course, for Heston in particular) as:

where and are standard Gaussian random variables with correlation .

To generate correlated standard Gaussian random variables, i.e. and , the most popular method is to use what is called the Cholesky decomposition. Given two uncorrelated standard Gaussian random variables and (easily done both in Excel and in R), Cholesky decomposition can be used to generate and as:

If, God forbid, your job requires simulating three correlated stochastic differential equations, say when you are using a Double Heston or a Double Lognormal model, then you would need to simulate three jointly correlated Gaussian random variables.

In that case if the correlation structure is , and , then given three uncorrelated Gaussian random variables , and , one could use Cholesky decomposition to generate generate , and as:

where

and .