Prof. Dr. Markus Meier
Leibniz Institute for Baltic Sea Research Warnemünde (IOW)
E-Mail: markus.meier@io-warnemuende.de
Covariance Matrix#
Covariance matrix
Correlation
Covariance Matrix#
scalar covariance between two continous random variables X and y:
with \(\mu_x\) and \(\mu_y\) as expectations for x and y and the pdf \(f_{X,Y}(x,y)\)
covariance between the two variables is large when the product \((x-\mu_x)(y-\mu_y)\) and the pdf are large
covariance matrix = covariance between all possible pairs of the components \(X_i\) and \(Y_j\) of the vectors \(\vec{X}\) and \(\vec{Y}\), with i=1..m and j=1..n the covariance matrix is a (m x n)-matrix
characteristics of the covariance matrix:
the covariance describes the tendency of jointly continous random variables to vary in concert. If deviations of \(X_i\) and \(Y_j\) from their respective means tend to be of the same sign, the covariance between them will be positive and vice versa
\(X_i\) and \(Y_j\) are said to be independent if the covariance is zero
The covariance is only a good measure of the joint variability of two continous random variables if each of them is nearly normal distributed (as the variance of a pdf for the spread)
auto covariance \(\Sigma^2_{\vec{X},\vec{X}}\) is symmetric
Correlation#
scale invariant correlatiion:
characteristics of the correlation:
The correlation coefficient takes values in the interval [-1,1].
\(\rho_{x_iy_j}\) builds the (i,j)-th element of the correlation matrix between \(\vec{V}\) and \(\vec{Y}\).
As for the covariance, the correlation coefficients are an indication of the extent to which two variables X and Y are linearly related; that is \(Y=a+bx\).
\(\rho^2_{xy}\) can be interpreted as the explained variance. Is is the proportion of the variance of one of the variables that can be represented by a linear model of the other.
Note, that two variables with zero correlation can still be related by a non-linear relation
Note, that two variables with non-zero correlation are not necessarily directly related to each other. Both can depend on a third variable.
As for the covariance, the correlation is only a good measure to covariability if both variables are nearly normal distributed.
\(\rho_{X_iX_j}\) refers to the auto-correlation if \(X_i\) and \(X_j\) are variables of the same quantity (e.g. temperature). The cross-correlation otherwise.
We refer to lag/lead correlations if the indices \(i,j\) refer to different in time.
statistics can deliver the indication of interrelation between two variabes, but to really confirm a connection one has to create a model and vary some parameters and analyse the outcome
there are different types of correlatio coefficients but the pearson correlation coefficient assuming a linear interrelation between two variables is the most common one. others woulld be: Spearman’s rank correlation coefficient \(\rho\), Kendall’s tau \(\tau\), Point-Biserial correlation coefficient \(r_{pb}\) or Phoi coefficient \(\phi\).
diagonal elements of the correlation matrix: insert explanatory text
box correlation: insert explanatory text
teleconnections: insert explanatory text