Prof. Dr. Markus Meier
Leibniz Institute for Baltic Sea Research Warnemünde (IOW)
E-Mail: markus.meier@io-warnemuende.de
Probability density and distribution#
Probability density function and important parameters
Different probability distributions
Probability density function and important parameters#
Probability density function#
let \(x\) be a continous (not discrete!) variable that takes values in \(\Omega\), for example temperature, probability density function \(f_X(x)\) of an event X (i.e. T=10°C) is defined as a continous function on \(\mathbb{R}\) with the following three attributes:
Question: What is the unit of the pdf?
Answer: [\(f_X(x)\)] = [x]\(^{-1}\).Question: What is the integral of the pdf? Answer: The cumulative distribution function.
Cumulative distribution function#
Cumulative distribution function for an event X is a montonously increasing, non-dimensional function F_X(x) on \(\mathbb{R}\) defined as:
which is equivalent to:
consequently the probability of the event \(X\) to be inside the range of \((a,b)\) is:
Expectation \(\varepsilon\)#
the expectation of a given pdf weighs is with \(x\) in the integral:
two attributes of the expectation are:
Central moments \(\mu\)#
k-th moment of a continous random variable X:
k-th central moment of a continous random variable X:
example: anomalies with \(\mu\) mean seasonal cycle
mean \(\mu\): location parameter \(\mu = \mu^{(1)}\)
variance:
standard deviation:
Chebyshev’s inequality:
Skewness \(\gamma_1\)#
is a measure of the asymmetry of a distribution: symmetric for \(\gamma_1=0\), scaled version of the third central moment, non-dimensional shape parameter
Kurtosis \(\gamma_2\)#
is a measure of the peakedness of a distribution: a normal distribution (will be explained later this lecture) has \(\gamma_2=0\), scaled and shifted version of the fourth central moment, non-dimensional shape parameter
Examples#
summer sea level at Kieler Förde, \(\mu = 0.06\), \(\sigma=0.19\), \(\gamma_1=-0.6\), \(\gamma_2=4.07\)
probability densities of some measured variables
P-quantiles#
mean and variance are affected by the tail ends of the pdf (likelihood of extreme values), but p-quantiles \(x_p\) are insensitive to extreme values.
p quantile of 0.3 means that 30% of the x values are below this threshold
median m\(_{50}\) is the 50%-quantile: half of the distribution lays above and the other half below m\(_{50}\).
let’s look at the p-quantiles of the log-normal distribution in Figure 5 to get an idea. note the difference of mean and median!
Different probability distributions#
Uniform distribution#
symmetric and less peaked than the normal distribution:
with the cumulative distribution function:
exercise: calculate \(\mu,Var,\sigma,\gamma_1,\gamma_2\) of the uniform distribution \(\cal U(a,b)\)
solutions: \(\mu(\cal U(a,b))= \frac{1}{2}(a+b)\), \(Var(\cal U(a,b))= \frac{1}{12}(b-a)^2\), \(\sigma(\cal U(a,b))= \sqrt{\frac{1}{12}}(b-a)\), \(\gamma_1(\cal U(a,b))= 0\), \(\gamma_2(\cal U(a,b))= -1.2\)
Normal (Gaussian) distribution#
most physical quantities are nearly normal distributed
no skewness or kurtosis: \(\gamma_1=\gamma_2=0\)
no analytical form of cdf, approximation:
central limit theorem states: If \(X_k,k=1,2,...\) is an infinite series of independent and identically distributed random variables with \(\varepsilon(X_k)=\mu\) and \(Var(X_k)=\sigma^2\) then the average \(\frac{1}{n} \sum^n_{k=1}X_k\) is asymptotically normal distributed. That is:
a larger sample size reduces the standard deviation as of:
Log-normal distribution#
distribution of positive definite quantities such as rainfall, wind speed
with the median value \(\theta\) and
exercise: derive a general for the k-th central moment of the distribution
solution: \(\varepsilon(X^k) = \theta^ke^{{k\sigma}^2/2}\)
\(\chi^2\)-distribution#
sum of k independent squared \(\cal N(0,1)\) random variables, k number of degrees of freedom, application for the pdfs of variance estimates:
with
it has handy attributes:
Student’s t-distribution#
application for testing the significance of the differences in the means. be \(t(k)\) a test variable with \(k>0\), if A and B are independent random variables such that
the t-distribution can be written as:
using the \(\Gamma\)-function (28) the distribution can also be written as:
t-test?
Fisher-F-distribution#
application for testing the significance of the differences in the variance. for \(\chi^2\)-distributed \(K\) and \(L\):
the F-distribution is given by:
alternatively the probsbility density of the F-distribution is also given by:
Summary of theoretical distributions#
Continous random vectors, multi-variate data#
example: vectors X temperature and Y sea level pressure: