Covariance Matrix

Climate of the Earth system

Prof. Dr. Markus Meier
Leibniz Institute for Baltic Sea Research Warnemünde (IOW)
E-Mail: markus.meier@io-warnemuende.de

Covariance Matrix#

  1. Covariance matrix

  2. Correlation

Covariance Matrix#

  • scalar covariance between two continous random variables X and y:

\[Cov(X,Y)=\sigma^2_{XY} = \int\int_{\mathbb{R}^2} (x-\mu_x)(y-\mu_y)f_{X,Y}(x,y)~dxdy\]
  • with \(\mu_x\) and \(\mu_y\) as expectations for x and y and the pdf \(f_{X,Y}(x,y)\)

  • covariance between the two variables is large when the product \((x-\mu_x)(y-\mu_y)\) and the pdf are large

../_images/L11_1_covariance.PNG
  • covariance matrix = covariance between all possible pairs of the components \(X_i\) and \(Y_j\) of the vectors \(\vec{X}\) and \(\vec{Y}\), with i=1..m and j=1..n the covariance matrix is a (m x n)-matrix

\[\Sigma^2_{\vec{X},\vec{Y}} = \int_{\mathbb{R}^m}\int_{\mathbb{R}^n} (\vec{x}-\vec{\mu_x})(\vec{y}-\vec{\mu_y})^T f_{\vec{X},\vec{Y}}(\vec{x},\vec{y})~d\vec{x}d\vec{y}\]
  • characteristics of the covariance matrix:

    1. the covariance describes the tendency of jointly continous random variables to vary in concert. If deviations of \(X_i\) and \(Y_j\) from their respective means tend to be of the same sign, the covariance between them will be positive and vice versa

    2. \(X_i\) and \(Y_j\) are said to be independent if the covariance is zero

    3. The covariance is only a good measure of the joint variability of two continous random variables if each of them is nearly normal distributed (as the variance of a pdf for the spread)

    4. auto covariance \(\Sigma^2_{\vec{X},\vec{X}}\) is symmetric

../_images/L11_2_covariance_2m.PNG

Correlation#

  • scale invariant correlatiion:

\[\rho_{x,y} = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{Cov(X,Y)}{\sigma(X)\sigma(Y)}\]
  • characteristics of the correlation:

    1. The correlation coefficient takes values in the interval [-1,1].

    2. \(\rho_{x_iy_j}\) builds the (i,j)-th element of the correlation matrix between \(\vec{V}\) and \(\vec{Y}\).

    3. As for the covariance, the correlation coefficients are an indication of the extent to which two variables X and Y are linearly related; that is \(Y=a+bx\).

    4. \(\rho^2_{xy}\) can be interpreted as the explained variance. Is is the proportion of the variance of one of the variables that can be represented by a linear model of the other.

    5. Note, that two variables with zero correlation can still be related by a non-linear relation

    6. Note, that two variables with non-zero correlation are not necessarily directly related to each other. Both can depend on a third variable.

    7. As for the covariance, the correlation is only a good measure to covariability if both variables are nearly normal distributed.

    8. \(\rho_{X_iX_j}\) refers to the auto-correlation if \(X_i\) and \(X_j\) are variables of the same quantity (e.g. temperature). The cross-correlation otherwise.

    9. We refer to lag/lead correlations if the indices \(i,j\) refer to different in time.

../_images/L11_3_correlation.PNG
../_images/L11_4_correlation.PNG
  • statistics can deliver the indication of interrelation between two variabes, but to really confirm a connection one has to create a model and vary some parameters and analyse the outcome

  • there are different types of correlatio coefficients but the pearson correlation coefficient assuming a linear interrelation between two variables is the most common one. others woulld be: Spearman’s rank correlation coefficient \(\rho\), Kendall’s tau \(\tau\), Point-Biserial correlation coefficient \(r_{pb}\) or Phoi coefficient \(\phi\).

  • diagonal elements of the correlation matrix: insert explanatory text

../_images/L11_5_pttopt_correlation.PNG
  • box correlation: insert explanatory text

../_images/L11_6_box_correlation.PNG
  • teleconnections: insert explanatory text

../_images/L11_7_teleconnections.PNG
../_images/L11_8_NAO.PNG