Estimation of statistical parameters

Climate of the Earth system

Prof. Dr. Markus Meier
Leibniz Institute for Baltic Sea Research Warnemünde (IOW)
E-Mail: markus.meier@io-warnemuende.de

Estimation of statistical parameters#

  • lets assume a sample of n independent and identically distributed (iid) random variables \({X_1,~X_2,~...,~X_n}\) and a common probability distribution function \(f_X\) with no specific form

  • discrete conditional samples of continous random variables: frequency histogram - an estimator for the pdf. or phase space (e.g. \(\mathbb{R}\)) is divided into K subsets \(\Theta_k\):

\[\cup_{k=1}^K \Theta_k = \mathbb{R},~~\mathrm{and}~~ \Theta_k \cap \Theta_j = \emptyset ~~\mathrm{for}~~ k \neq j\]
  • frequeny histogram: nuber of observations that fall into each \(\Theta_k\) divided by the total number of observations:

\[\mathbf{H}(\Theta_k) = \frac{|\{\mathbf{X_k}: \mathbf{X_k} \in \Theta_k\}|}{n}\]
  • \(\mathbf{H}(\Theta_k)\) is an estimator of:

\[P(\mathbf{X_k} \in \Theta_k) = \int_{\Theta_k}f_X(x)~dx\]
  • with the empirical pdf:

\[\widehat{f_x}(x) = \frac{\mathbf{H}(\Theta_k)}{\int_{\Theta_k}dx}~~\mathrm{if}~~x \in \Theta_k\]
  • and the empirical cumulative distribution function

\[\widehat{F_x}(x) = \mathbf{H}([-\infty,x])\]
  • best estimate of the mean \(\mu = \int_{-\infty}^{\infty}xf_X(x)~dx\) is:

\[\widehat{\mu} = \mathbf{\bar{X}} = \frac{1}{n} \sum_{k=1}^n \mathbf{X_k}\]
  • estimating the central moments

\[\int_{\Omega} \widehat{g(x)f_X(x)}~dx = \frac{1}{n} \sum_{k=1}^n g(X_k)\]
  • best estimate of the variance is:

\[\widehat{\sigma^2} = \frac{1}{n-1} \sum_{k=1}^n(X_k-\widehat{\mu})^2\]
  • root mean square error (RSME) with a priori known mean:

\[\varepsilon_{RMS} = \sqrt{\frac{1}{n} \sum_{k=1}^n(X_k-\mu)^2}\]
  • estimating the covariance:

\[\widehat{\sigma^2_{ij}} = \frac{1}{n-1} \sum_{k=1}^n (X_{k;i}-\widehat{\mu}_i)(X_{k;j}-\widehat{\mu}_j)\]
  • estimating the correlation

\[\widehat{\rho}_{ij} = \frac{\widehat{Cov}(X_i,X_j)}{\sqrt{\widehat{Var}(X_i)\widehat{Var}(X_j)}}\]
  • pearsons correlation coefficient r:

\[\widehat{\rho}_{ij} = \frac{\sum_{k=1}^n(X_{k;i}-\widehat{\mu}_i)(X_{k;j}-\widehat{\mu}_j)}{\sqrt{\sum_{k=1}^n(X_{k;i}-\widehat{\mu}_i)^2} \sqrt{\sum_{k=1}^n(X_{k;j}-\widehat{\mu}_j)^2}}\]