# 18 - Covariance Matrices
- 18.1 Correlation
- 18.2 Variance and standard deviation
- 18.3 Covariance
- 18.4 Correlation coefficient
- 18.5 Covariance matrices
- 18.6 Correlation to covariance
- 18.7 Code challenges
- 18.8 Code solutions

Notes, code snippets, and the end of chapter exercises from the book _Linear Algebra: Theory, Intuition, Code_ by Mike X Cohen. 

Find more information about the book on [github](https://github.com/mikexcohen/LinAlgBook) and [amazon](https://www.amazon.com/Linear-Algebra-Theory-Intuition-Code/dp/9083136604).

In [1]:
import numpy as np

## 18.1 Correlation
Facts about correlation...
- Correlation is always a bivariate metric.
- Correlation matrix $R$ is a means of organizing the bivariate correlations among multiple variables.
- Correlation is related to, but not the same as covariance.
- Correlation is normalized by variance and has range $[-1,+1]$.

Facts about covariance...
- Covariance describes joint variability of 2 random variables.
- Covariance matrix $C$ is based on expected value $E[(X - \mu)(X - \mu)^T]$ of 0-mean data matrix $X$.
- Covariance range depends on data.

Both correlation and covariance are constrained to describe linear relationships.

## 18.2 Variance and standard deviation
Variance $\sigma^2$ is the average of the squared difference of $n$ observations of variable $x$ and its' mean $\bar{x}$.

$$
\sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2
$$

Notes
- If all possible observations are present then the formula shown above is referred to as the population variance.
- When only a sample of observations are present, then the denominator is $n-1$ and the quantity is referred to as the sample variance.
- Standard deviation $\sigma$ is derived from variance $\sigma = \sqrt{\sigma^2}$

In [2]:
n = 100
x = np.random.random(n)

sigma2 = np.sum(np.square(x - np.mean(x))) / n

# Verify sigma2 = np.var.
np.testing.assert_almost_equal(sigma2, np.var(x))

## 18.3 Covariance
Covariance is the expected variance between two variables $E[(X - E[X])(Y - E[Y])]$.

$$
\text{COV}(x,y) = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})
$$

Computation of covariance simplifies to dot product when...
1. Data are centered e.g. have 0-mean.
2. Both variables have the same number of observations e.g. equal $n$.

$$
\text{COV}(x,y) = \frac{x^T y}{n} \quad,\quad \bar{x} = \bar{y} = 0
$$

In [3]:
# Generate a pair of vectors that differ only by some added noise.
n = 100
x = np.random.random(n)
y = x + np.random.random(n) - 0.5

# Compute covariance using uncentered data.
cov1 = np.sum((x - np.mean(x)) * (y - np.mean(y))) / n

# Compute covariance as dot product using centered data.
x0, y0 = x - np.mean(x), y - np.mean(y)
cov2 = x0.T @ y0 / n

# Compute the covariance matrix C using uncentered data.
cov3 = np.cov(x,y, bias=True)

# Verfiy covariances are equal.
np.testing.assert_almost_equal(cov1, cov2)
np.testing.assert_almost_equal(cov2, cov3[0,1])  # Off-diagonal are covariance.
np.testing.assert_almost_equal(cov2, cov3[1,0])  # Off-diagonal are covariance.

# Verify diagonal elements of covariance matrix are variances.
np.testing.assert_almost_equal(np.var(x), cov3[0,0])
np.testing.assert_almost_equal(np.var(y), cov3[1,1])

print(f"covariance cov(x,y): {cov1:.2f}")

covariance cov(x,y): 0.08


## 18.4 Correlation coefficient
Correlation coefficient $r$ describes the correlation between zero-centered data and has range $[-1,+1]$.

$$
r = \frac{x^T y}{||x||\:||y||}
$$

Notes
- When $x$ and $y$ are not centered, then the correlation coefficient becomes the cosine similarity.
- Correlation coefficient is an example of mapping over magnitude from (_13.1_).

In [4]:
# Generate a pair of vectors that differ only by some added noise.
n = 100
x = np.random.random(n)
y = x + np.random.random(n) - 0.5

# Compute the correlation coefficient using centered data.
x0, y0 = x - np.mean(x), y - np.mean(y)
r1 = (x0.T @ y0) / (np.linalg.norm(x0) * np.linalg.norm(y0))

# Compute the correlation matrix R using uncentered data.
r2 = np.corrcoef(x, y)

# Verify the correlation coefficients are equal.
np.testing.assert_almost_equal(r1, r2[0,1])  # Off-diagonal are correlation.
np.testing.assert_almost_equal(r1, r2[1,0])  # Off-diagonal are correlation.

# Verify diagonal elements of correlation matrix are 1.
np.testing.assert_almost_equal(1., r2[0,0])
np.testing.assert_almost_equal(1., r2[1,1])

print(f"correlation coefficient (r): {r1:.2f}")

correlation coefficient (r): 0.76


## 18.5 Covariance matrices
For matrices with observations in rows and features along the columns, then covariance matrix is $C = X^T X$.

For matrices with observations in columns and features along the rows, then covariance matrix is $C = X X^T$.

Notes
- In both cases, the data matrix $X$ should be centered e.g. 0-mean.
- Diagonal elements of covariance matrix are variances of _ith_ feature $\text{diag}(C) = \sigma_1^2, \sigma_i^2, \dots, \sigma_n^2$.

## 18.6 Correlation to covariance
Correlation matrix $R$ is derived from the symmetric data matrix $X^T X$ and matrix of feature standard deviations $\Sigma$.

$$
R = \frac{1}{n} \Sigma^{-1} X^T X \Sigma^{-1}
$$

Notes
- $\Sigma$ is a matrix of feature standard deviations $\sigma_1, \sigma_2, \dots, \sigma_n$.
- Recall the inverse of a diagonal matrix is the reciprocal of the diagonal elements (_12.2_).
- For matrices with observations in columns, the symmetric data matrix is $X X^T$.

### Covariance Matrix From Correlation Matrix
Covariance matrix $C$ can be derived from correlation matrix $R$.

$$
C = E[(X - \mu)(X - \mu)^T] = \frac{1}{n} \Sigma R \Sigma
$$

Notes
- $\Sigma$ appears in order to cancel $\Sigma^{-1}$.

## 18.7 Code challenges

> Implement the equations for the correlation matrix $R$ and covariance matrix $C$. Create a data matrix of random numberswith 4 features and 200 observations. Compare the values you obtain for $R$ and $C$ from your functions with the built-in functions from numpy. 

In [5]:
def correlation(X):
    """
    correlation returns the correlation matrix R

    :param X: numpy.ndarray  Matrix X with observations in rows
    :return: numpy.ndarray   Correlation matrix R
    """
    # Center the data to 0-mean along the columns.
    mu = np.mean(X, axis=0)
    Xmu = X - mu

    # Compute the standard deviation of each 0-mean feature.
    Sigma = np.diag(1./np.std(Xmu, axis=0))

    # Compute the correlation matrix.
    R = Sigma @ Xmu.T @ Xmu @ Sigma

    return R / X.shape[0]


# Generate a random matrix with features along the columns.
m, n = 200, 4
mu, var = 3, 2
X = np.random.normal(loc=mu, scale=var, size=(m,n))

# Compute the correlation matrix.
R = correlation(X)

# Compute the correlation matrix using numpy.
Rprime = np.corrcoef(X, rowvar=False)

# Verify that R matches the built-in function from numpy.
np.testing.assert_almost_equal(R, Rprime)

print(f'R:\n {R}')

R:
 [[ 1.          0.0478318  -0.01930313  0.11490369]
 [ 0.0478318   1.         -0.08054259 -0.01775496]
 [-0.01930313 -0.08054259  1.         -0.00494701]
 [ 0.11490369 -0.01775496 -0.00494701  1.        ]]


In [6]:
def covariance(X):
    """
    covariance returns the covariance matrix C

    :param X: numpy.ndarray  Matrix X with observations in rows
    :return: numpy.ndarray   Covariance matrix C
    """
    # Center the data to 0-mean along the columns.
    mu = np.mean(X, axis=0)
    Xmu = X - mu

    # Compute the covariance matrix.
    C = Xmu.T @ Xmu

    return C / X.shape[0]


# Generate a random matrix with features along the columns.
m, n = 200, 4
mu, var = 3, 2
X = np.random.normal(loc=mu, scale=var, size=(m,n))

# Compute the covariance matrix.
C = covariance(X)

# Compute the covariance matrix using numpy.
Cprime = np.cov(X, rowvar=False, bias=True)

# Verify that C matches the built-in function from numpy.
np.testing.assert_almost_equal(C, Cprime)

print(f'C:\n {C}')

C:
 [[4.00441146 0.17315083 0.22394181 0.25960343]
 [0.17315083 3.59708349 0.46768008 0.10677363]
 [0.22394181 0.46768008 3.96628546 0.085771  ]
 [0.25960343 0.10677363 0.085771   3.63330715]]
