Expectation, variance, and covariance: measuring uncertainty
This document is optional, but strongly recommended.
Once uncertainty is represented through random variables, the next step is to quantify it.
Expectation, variance, and covariance are the core mathematical tools we use to describe:
- typical behavior,
- variability,
- and dependence.
They appear everywhere in forecasting, whether explicitly or implicitly.
1 Expectation: the long-run average
The expectation (or expected value) of a random variable describes its typical value in the long run.
For a discrete random variable X with probability mass function p(x),
\mathbb{E}[X] = \sum_x x \, p(x).
For a continuous random variable with density f(x),
\mathbb{E}[X] = \int_{-\infty}^{\infty} x \, f(x)\, dx.
Expectation is not what you expect to observe next.
It is what you would obtain on average, across many repetitions.
In forecasting, expectation often corresponds to:
- the point forecast,
- the mean of a forecast distribution,
- the baseline around which uncertainty is measured.
2 Linearity of expectation (this matters a lot)
One of the most important properties of expectation is linearity:
\mathbb{E}[aX + b] = a\,\mathbb{E}[X] + b,
for constants a and b.
Even more importantly,
\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y],
regardless of whether X and Y are independent.
Linearity of expectation holds without independence.
This fact underlies many results in time series and forecasting.
3 Variance: how uncertain is a random variable?
Expectation alone is not enough.
Two random variables may have the same mean but very different levels of uncertainty.
The variance of X measures how much values fluctuate around their expectation:
\operatorname{Var}(X) = \mathbb{E}\big[(X - \mu)^2\big], \qquad \mu = \mathbb{E}[X].
An equivalent and often more convenient expression is:
\operatorname{Var}(X) = \mathbb{E}[X^2] - \mu^2.
Variance is measured in squared units.
The standard deviation \sigma = \sqrt{\operatorname{Var}(X)} restores the original scale.
In forecasting, variance captures:
- forecast uncertainty,
- volatility,
- the typical size of forecast errors.
4 How variance reacts to transformations
Variance behaves very differently from expectation under transformations.
For constants a and b,
\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X).
Adding a constant does nothing to variance.
Scaling a variable scales variance quadratically.
Variance is sensitive to scale.
This is why transformations (e.g., logarithms) can dramatically change model behavior.
5 Covariance: measuring dependence
Variance describes uncertainty of a single random variable.
Covariance describes how two random variables vary together.
For random variables X and Y,
\operatorname{Cov}(X, Y) = \mathbb{E}\big[(X - \mu_X)(Y - \mu_Y)\big] = \mathbb{E}[XY] - \mu_X \mu_Y.
- Positive covariance: large values of X tend to occur with large values of Y
- Negative covariance: large values of X tend to occur with small values of Y
If X and Y are independent, then
\operatorname{Cov}(X,Y) = 0.
The converse is not generally true.
6 Covariance in time series
In time series analysis, covariance appears in a very specific form.
Given a sequence \{Y_t\}, we define the lag-h covariance as:
\operatorname{Cov}(Y_t, Y_{t-h}).
This quantity measures how observations relate to their own past.
Autocovariance is the foundation of:
- autocorrelation,
- stationarity,
- AR and MA models.
If you understand covariance, you are already halfway to understanding time series models.
7 Correlation (preview, not the full story)
Covariance depends on scale, which makes comparisons difficult.
The correlation coefficient rescales covariance:
\rho(X,Y) = \frac{\operatorname{Cov}(X,Y)} {\sqrt{\operatorname{Var}(X)\operatorname{Var}(Y)}}.
Correlation lies between -1 and 1, but:
Correlation measures linear association, not general dependence.
This distinction becomes critical in time series.
Correlation deserves its own refresher — and it will get one.
8 Where this shows up in the course
Expectation, variance, and covariance appear throughout the course:
- point forecasts and forecast distributions,
- forecast error evaluation,
- residual diagnostics,
- autocorrelation functions,
- stationarity assumptions.
They are not optional background — they are the mathematical backbone of forecasting.
9 What you do not need yet
At this stage, you do not need:
- higher-order moments,
- distribution-specific formulas,
- closed-form derivations.
Those concepts matter, but only once the core quantities are fully internalized.
10 Takeaway
Expectation describes the center.
Variance describes uncertainty.
Covariance describes dependence.
Together, they define how time series behave.