Linear Regression Models for Time Series

Modified

March 29, 2026

Code
1library(plotly)
2library(car)
1
For interactive plots.
2
For the vif() function used in the multicollinearity section.

1 From Univariate to Multivariate Models

Every model we have built so far shares one fundamental characteristic: it is univariate. Whether it was STL decomposition, ETS, or ARIMA, we only looked inward — we used the history of y_t itself to forecast its future.

That approach has taken us surprisingly far. But it ignores the outside world entirely.

  • A retailer’s sales are not just a function of past sales — they depend on promotions, competitor prices, and economic conditions.
  • Energy demand depends on temperature, not just its own history.
  • Mexico’s retail trade index (mexretail) reflects consumer purchasing power, employment, and credit conditions — none of which appear in the series itself.

1.1 Why exogenous variables?

The question we are addressing now is: what if we gave our model access to external information?

  • In Modules 1–2, our model improved by getting smarter about the structure of y_t (trend, seasonality, autocorrelation).
  • In Module 3, we improve by adding context: variables from outside the series that help explain its movement.
  • This is the transition from filtering to causal modeling — at least in the predictive, not necessarily structural, sense.
Module What we modeled Tool
1 Trend + seasonality via decomposition STL + SNAIVE/Drift
2 Autocorrelation structure of the signal ETS, ARIMA
3 Relationship with external variables TSLM, dynamic regression
4 Complex seasonality, robustness Prophet, bootstrapping

Each module has asked: what does my current model fail to capture? The answer now is: information from other series.

2 The Linear Regression Model

The simplest case is the simple linear regression model:

y_t = \beta_0 + \beta_1 x_t + \varepsilon_t

where:

  • \beta_0 is the intercept: the predicted value of y when x = 0.
  • \beta_1 is the slope: the average change in y for a one-unit change in x.
  • \varepsilon_t is the error term: captures all other influences on y_t not explicitly modeled.

We cannot observe the \beta’s directly — we estimate them from data. The Ordinary Least Squares (OLS) method chooses the estimates \hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_k that minimize the sum of squared residuals:

\min_{\hat{\boldsymbol{\beta}}} \sum_{t=1}^{T} \hat{\varepsilon}_t^2 = \sum_{t=1}^{T} \left(y_t - \hat{\beta}_0 - \hat{\beta}_1 x_{1t} - \cdots - \hat{\beta}_k x_{kt}\right)^2

For the simple case (k = 1), taking partial derivatives and setting them to zero yields closed-form solutions:

\hat{\beta}_1 = \frac{\sum x_t y_t - T\bar{x}\bar{y}}{\sum x_t^2 - T\bar{x}^2}, \qquad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}

For the general case with k predictors, the solution extends naturally to matrix form:

\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}

where \mathbf{X} is the design matrix — a column of ones (for the intercept) plus one column per predictor — and \mathbf{y} is the vector of observed responses.

In practice, R handles all of this internally. But knowing the objective function matters: OLS penalizes large residuals more than small ones (because of the square), and under the classical assumptions the Gauss-Markov theorem guarantees the estimates are BLUE (Best Linear Unbiased Estimators) — regardless of whether we have one predictor or twenty.

2.1 Some motivating examples

  • Ice cream sales (y) and daily temperature (x).
  • Nike revenue (y) and marketing spend (x).
  • US consumption growth (y) and income growth (x).
  • Mexico retail trade (y) and consumer confidence index (x).

The same concept appears under different names depending on the discipline:

y (forecast variable) x (predictor variables)
Dependent Independent
Explained Explanatory
Regressand Regressor
Response Stimulus / Covariate
Endogenous Exogenous

In time series forecasting, FPP3 uses forecast variable and predictor variables.

The term was coined by Francis Galton while studying the relationship between parents’ height and their children’s height.

His finding: tall parents tend to have tall children, but on average their children are not as tall as them. Short parents tend to have children taller than themselves. There is a tendency to regress toward the mean.

Without this regression to the mean, the distribution of heights across generations would diverge — we would eventually have people of Hobbit stature and people of giant stature, with nothing in between.

Modern regression analysis generalizes this idea: studying the dependence of one variable on one or more others to predict its average value.

2.2 Regression and Causality

“A statistical relationship, however strong and suggestive, can never establish causal connexion: our ideas of causality must come from outside statistics, ultimately from some theory or other.”

— Kendall & Stuart (1961)

\text{Regression} \neq \text{Causality} \qquad \text{Correlation} \neq \text{Causality}

This is not a technicality — it is one of the most practically important ideas in data science:

  • People who carry lighters have higher rates of lung cancer. Does carrying a lighter cause cancer?
  • Ice cream sales and drowning rates are correlated. Should we ban ice cream?
  • Homeopathic treatments show improvements. Does the remedy work, or did patients improve on their own?
  • Casually: “every time I hiccup and hold my breath, it goes away.” Does holding your breath cure hiccups, or would they have stopped anyway?

2.2.1 Spurious correlations

The web is full of striking examples where two completely unrelated time series happen to move together — these are called spurious correlations.

Important

A regression model would happily fit a line through any of those charts. The model does not know the relationship is nonsense — you have to know. Causal direction requires theory, domain knowledge, or experimental design, not statistical strength alone.

2.3 What Does “Linear” Mean?

Are these models linear?

Linearity can be defined in two senses:

  1. Linearity in the variablesE(y \mid x) is a linear function of x. This is the restrictive sense: only straight lines.

  2. Linearity in the parametersE(y \mid x) is linear in the \beta’s.

The three models above are all nonlinear in x but linear in \beta. A linear regression model can produce a straight line, a parabola, an exponential curve, or a piecewise function, depending on the functional form chosen.

3 Regression in R: us_change

We will use us_change — quarterly percentage changes in US macroeconomic variables. Our forecast variable is Consumption.

Code
us_change

3.1 Exploratory analysis

Before fitting any model, we look at the data.

Code
us_change |>
  as_tibble() |>
  select(-Quarter) |>
  GGally::ggpairs()

The scatter matrix shows correlations between all pairs of variables. Notice that some predictors are correlated with each other — we will return to this when we discuss multicollinearity.

3.2 Simple Linear Regression

We start with the simplest possible model: Consumption as a function of Income only.

y_{t, \text{Consumption}} = \beta_0 + \beta_1 x_{t, \text{Income}} + \varepsilon_t

In fable, time series linear models use TSLM():

Code
1us_change_fit_simple <- us_change |>
  model(
2    simple = TSLM(Consumption ~ Income)
  )

3report(us_change_fit_simple)
1
Start with the us_change tsibble.
2
TSLM() fits a Time Series Linear Model using OLS. The formula syntax is identical to lm().
3
report() prints the full regression output.
Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-2.58236 -0.27777  0.01862  0.32330  1.42229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
Income       0.27183    0.04673   5.817  2.4e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08

3.2.1 Reading the regression output

The output has two main parts:

Individual significance — each coefficient has a t-test: H_0: \beta_i = 0 \qquad H_1: \beta_i \neq 0 The stars (*, **, ***) indicate p < 0.05, p < 0.01, p < 0.001.

Joint significance — the F-test asks whether any predictor is useful: H_0: \beta_1 = \beta_2 = \cdots = \beta_k = 0 A significant F-test does not mean all individual coefficients are significant.

The estimated model is: \hat{y}_{t, \text{Consumption}} = 0.545 + 0.272 \cdot x_{t, \text{Income}}

For every 1 percentage point increase in income growth, consumption growth increases by 0.272 percentage points on average.

3.3 Residual Diagnostics

3.3.1 Gauss-Markov assumptions

OLS produces BLUE estimators only if the classical assumptions hold. The one most frequently violated in time series is no autocorrelation in the errors:

\text{cov}(\varepsilon_t, \varepsilon_s) = 0 \quad \forall \; t \neq s

If residuals are autocorrelated, OLS estimates are still unbiased but no longer efficient — standard errors are wrong, p-values are wrong, and forecast intervals are wrong. This is why we always check the residuals.

Code
us_change_fit_simple |>
  gg_tsresiduals()

Code
1us_change_dof_simple <- us_change_fit_simple |>
  tidy() |>
  nrow()

augment(us_change_fit_simple) |>
  features(.resid, ljung_box,
           lag = 10,
2           dof = us_change_dof_simple)
1
Extract the number of estimated parameters to use as degrees of freedom correction.
2
A significant p-value indicates autocorrelated residuals — a sign the model is missing structure.

The residuals show clear autocorrelation. The simple model is not capturing all the dynamics of consumption. Let’s improve it.

4 Multiple Linear Regression

In practice, one predictor is rarely sufficient. The multiple linear regression model is:

y_t = \beta_0 + \beta_1 x_{1t} + \beta_2 x_{2t} + \cdots + \beta_k x_{kt} + \varepsilon_t

For us_change, a natural extension is:

y_{t, \text{Cons.}} = \beta_0 + \beta_1 x_{t, \text{Inc.}} + \beta_2 x_{t, \text{Prod.}} + \beta_3 x_{t, \text{Sav.}} + \beta_4 x_{t, \text{Unemp.}} + \varepsilon_t

Code
us_change_fit_mult <- us_change |>
  model(
1    multiple = TSLM(Consumption ~ Income + Production + Savings + Unemployment)
  )

report(us_change_fit_mult)
1
The formula syntax is the same as the simple linear regression: just add more predictors with +.
Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90555 -0.15821 -0.03608  0.13618  1.15471 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.253105   0.034470   7.343 5.71e-12 ***
Income        0.740583   0.040115  18.461  < 2e-16 ***
Production    0.047173   0.023142   2.038   0.0429 *  
Savings      -0.052890   0.002924 -18.088  < 2e-16 ***
Unemployment -0.174685   0.095511  -1.829   0.0689 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic:   160 on 4 and 193 DF, p-value: < 2.22e-16

4.0.1 Comparing simple vs. multiple

Grey line = actual consumption; colored line = fitted values. The multiple model tracks the data much more closely.

Points closer to the 45° line indicate better fit. The multiple model shows a much tighter cloud.

Simple model residuals: significant ACF spikes indicate remaining autocorrelation.

Simple model residuals: significant ACF spikes indicate remaining autocorrelation.

Multiple model residuals: ACF spikes are much reduced and the distribution is closer to normal.

Multiple model residuals: ACF spikes are much reduced and the distribution is closer to normal.

Residuals from the multiple model vs. each predictor. No strong patterns suggest the linear specification is adequate.

Residuals from the multiple model vs. each predictor. No strong patterns suggest the linear specification is adequate.
Code
bind_rows(
  augment(us_change_fit_simple) |>
    features(.resid, ljung_box, lag = 10, dof = 2) |>
    mutate(.model = "Simple"),
  augment(us_change_fit_mult) |>
    features(.resid, ljung_box, lag = 10, dof = 5) |>
    mutate(.model = "Multiple")
) |>
  select(.model, lb_stat, lb_pvalue)

The multiple model substantially improves fit (\bar{R}^2 goes from ~0.15 to ~0.75) and the residuals look much closer to white noise.

The residuals vs. predictors plot is a useful diagnostic specific to multiple regression. If any panel shows a systematic pattern (curve, fan shape), it suggests a missing nonlinear term or interaction involving that predictor.

5 Predictor Selection

With k potential predictors, there are 2^k possible models. We need a principled way to choose among them.

Do not use in-sample R^2 for selection. Adding any predictor, even a random one, increases R^2. We need measures that penalize model complexity.

The three standard criteria available from glance() in fable:

  • Adjusted \bar{R}^2: penalizes for additional parameters. Maximize it.
  • AIC (Akaike Information Criterion): -2\log L + 2k. Minimize it. Optimizes predictive performance.
  • AICc: AIC corrected for small samples. Prefer this over AIC for time series.
  • BIC (Bayesian Information Criterion): -2\log L + k \log T. Minimize it. BIC penalizes complexity more heavily and tends to select smaller models than AICc.
Code
glance(us_change_fit_mult) |>
  select(adj_r_squared, AIC, AICc, BIC)

5.0.1 Search strategies

Fit all possible models and compare:

Code
us_change_fit_select <- us_change |>
  model(
    m1 = TSLM(Consumption ~ Income),
    m2 = TSLM(Consumption ~ Income + Production),
    m3 = TSLM(Consumption ~ Income + Savings),
    m4 = TSLM(Consumption ~ Income + Unemployment),
    m5 = TSLM(Consumption ~ Income + Production + Savings),
    m6 = TSLM(Consumption ~ Income + Production + Unemployment),
    m7 = TSLM(Consumption ~ Income + Savings + Unemployment),
    m8 = TSLM(Consumption ~ Income + Production + Savings + Unemployment)
  )

us_change_fit_select |>
  glance() |>
  select(.model, adj_r_squared, AIC, AICc, BIC) |>
  arrange(AICc)

Start with the full model and remove predictors one at a time:

Code
us_change |>
  model(
    full  = TSLM(Consumption ~ Income + Production + Savings + Unemployment),
    drop1 = TSLM(Consumption ~ Income + Production + Savings),
    drop2 = TSLM(Consumption ~ Income + Production + Unemployment),
    drop3 = TSLM(Consumption ~ Income + Savings + Unemployment),
    drop4 = TSLM(Consumption ~ Production + Savings + Unemployment)
  ) |>
  glance() |>
  select(.model, adj_r_squared, AICc, BIC) |>
  arrange(AICc)

Start with the intercept-only model and add predictors one at a time:

Code
# Step 1: which single predictor is best?
us_change |>
  model(
    f1 = TSLM(Consumption ~ Income),
    f2 = TSLM(Consumption ~ Production),
    f3 = TSLM(Consumption ~ Savings),
    f4 = TSLM(Consumption ~ Unemployment)
  ) |>
  glance() |>
  select(.model, AICc) |>
  arrange(AICc)
Code
# Step 2: add a second predictor to the best single-predictor model
us_change |>
  model(
    f1  = TSLM(Consumption ~ Savings),
    f12 = TSLM(Consumption ~ Savings + Income),
    f13 = TSLM(Consumption ~ Savings + Production),
    f14 = TSLM(Consumption ~ Savings + Unemployment)
  ) |>
  glance() |>
  select(.model, AICc) |>
  arrange(AICc)

Both penalize model complexity, but differently:

  • AICc minimizes expected out-of-sample prediction error. It tends to select slightly larger models.
  • BIC is consistent: as T \to \infty, it will select the true model (if it’s in the candidate set). It penalizes more heavily and selects smaller models.

In forecasting contexts, AICc is generally preferred because we care more about predictive accuracy than model parsimony per se. When they disagree, consider whether interpretability or predictive accuracy is the primary goal.

5.1 Multicollinearity

Predictor selection and multicollinearity are closely linked. Even after running all-subsets or stepwise search, the selected model may still include predictors that are highly correlated with each other — and that creates problems that no information criterion will automatically flag.

When two or more predictors are highly correlated, we have multicollinearity. It does not bias the OLS estimates, but it inflates the variance of the coefficients, making them unstable and hard to interpret.

  • In the extreme case of perfect collinearity (x_1 = c \cdot x_2), the model is unidentified: infinitely many solutions minimize SSR.
  • In practice, collinearity is a matter of degree, not a binary condition.
  • Warning signs: coefficients with unexpected signs, large standard errors, a significant F-test alongside non-significant individual t-tests.

5.1.1 Detecting multicollinearity: VIF

The Variance Inflation Factor (VIF) measures how much the variance of \hat{\beta}_j is inflated by its correlation with the other predictors:

\text{VIF}_j = \frac{1}{1 - R_j^2}

where R_j^2 is the R^2 obtained by regressing x_j on all the other predictors — a measure of how well one predictor can be linearly predicted by the rest.

  • \text{VIF} = 1: no collinearity with other predictors.
  • \text{VIF} \in (1, 5): moderate — generally acceptable.
  • \text{VIF} > 5: concerning, examine carefully.
  • \text{VIF} > 10: severe — take action.
Code
us_change_fit_mult |>
1  extract_fit_engine() |>
2  vif()
1
extract_fit_engine() retrieves the underlying lm object from the fable mable, allowing us to use base-R or car diagnostics.
2
vif() from the car package computes the VIF for each predictor.

5.1.2 What to do about multicollinearity

If VIF values are large, you have several options:

  1. Drop one of the correlated predictors — if two predictors carry nearly the same information, keep the one with stronger theoretical justification or cleaner interpretability. This is often the right call during selection anyway.
  2. Combine them — create a composite index, take a ratio, or use the first principal component of the correlated group.
  3. Use a penalized regression (Ridge, Lasso) — these methods tolerate collinearity better than OLS by accepting a small bias in exchange for substantially lower variance. Outside this course’s scope, but worth knowing.
  4. Do nothing, intentionally — if your goal is prediction accuracy rather than coefficient interpretation, multicollinearity may be acceptable. Forecasts can still be good even when individual \hat{\beta}’s are unstable.

If you have a large number of potential predictors with substantial collinearity among them, consider dimensionality reduction (e.g., PCA) before fitting. That is outside this course’s scope, but it is a common preprocessing step in production forecasting systems.

6 Forecasting with Regression

Before we can forecast y_t with regression, we need to ask: what do we do with the predictors?

With univariate models, forecast(h = n) was sufficient — the model only needed its own history. Now the model needs future values of x_t to produce a forecast of y_t.

This is not a trivial problem. It creates three distinct forecasting scenarios:

6.1 Three types of regression forecasts

Ex-ante forecasts use only information available at the time the forecast is made. The predictors must themselves be forecasted.

This is the “true” forecast — it is what you would actually produce in a real deployment.

Code
# Step 1: forecast each predictor independently
us_change_fc_income <- us_change |>
  model(ARIMA(Income)) |>
  forecast(h = 8)

us_change_fc_prod <- us_change |>
  model(ARIMA(Production)) |>
  forecast(h = 8)

us_change_fc_sav <- us_change |>
  model(ARIMA(Savings)) |>
  forecast(h = 8)

us_change_fc_unemp <- us_change |>
  model(ARIMA(Unemployment)) |>
  forecast(h = 8)

# Step 2: construct future data with forecasted predictor values
us_change_future <- new_data(us_change, 8) |>
  mutate(
    Income       = us_change_fc_income |> pull(.mean),
    Production   = us_change_fc_prod   |> pull(.mean),
    Savings      = us_change_fc_sav    |> pull(.mean),
    Unemployment = us_change_fc_unemp  |> pull(.mean)
  )

# Step 3: forecast consumption using the projected predictors
us_change_fit_mult |>
  forecast(new_data = us_change_future) |>
  autoplot(us_change |> filter_index("2010 Q1" ~ .)) +
  labs(title = "Ex-ante forecast: US Consumption",
       y = "% change", x = NULL)

Ex-post forecasts use actual realized values of the predictors. The forecast variable y_t is still unknown, but we feed in the true future x_t values.

These are not “real” forecasts — they are used to evaluate the regression model in isolation, removing predictor forecast error from the evaluation.

Code
# Use filter_index to create a train/test split
us_change_train <- us_change |>
  filter_index(. ~ "2017 Q4")

us_change_test <- us_change |>
  filter_index("2018 Q1" ~ .)

us_change_fit_expost <- us_change_train |>
  model(
    multiple = TSLM(Consumption ~ Income + Production + Savings + Unemployment)
  )

# Ex-post: we supply actual future predictor values
us_change_fit_expost |>
  forecast(new_data = us_change_test) |>
  autoplot(us_change |> filter_index("2014 Q1" ~ .)) +
  labs(title = "Ex-post forecast: actual predictor values used",
       y = "% change", x = NULL)

Scenario-based forecasts construct hypothetical future values for the predictors and ask: what would consumption look like under each scenario?

These are especially useful for stress-testing and strategic planning.

Code
us_change_fit_scen <- us_change |>
  model(TSLM(Consumption ~ Income + Savings + Unemployment))

us_change_scenarios <- scenarios(
  optimistic = new_data(us_change, 4) |>
    mutate(Income = c(0.5, 0.8, 0.6, 1.0),
           Savings = c(0.1, -0.2, 0.1, -0.1),
           Unemployment = -0.1),
  pessimistic = new_data(us_change, 4) |>
    mutate(Income = -0.5,
           Savings = -0.4,
           Unemployment = 0.2),
  names_to = "Scenario"
)

us_change_fit_scen |>
  forecast(new_data = us_change_scenarios) |>
  autoplot(us_change |> filter_index("2014 Q1" ~ .)) +
  labs(title = "Scenario-based forecast: US Consumption",
       y = "% change", x = NULL)

7 Useful Predictors

TSLM() provides built-in terms that are particularly useful when working with time series:

  • trend() — a deterministic linear trend (t = 1, 2, \ldots, T).
  • season() — seasonal dummy variables (one per season, with one omitted as baseline to avoid perfect collinearity).
  • fourier(K) — Fourier terms for flexible seasonality (covered in Module 4).

We will illustrate with quarterly beer production in Australia.

Code
beer <- aus_production |>
  filter(year(Quarter) >= 1992) |>
  select(Quarter, Beer)

beer |>
  autoplot(Beer) +
  labs(title = "Australian quarterly beer production",
       y = "Megalitres", x = NULL)

7.1 Trend and Seasonal Dummies

Code
beer_fit <- beer |>
  model(
1    TSLM(Beer ~ trend() + season())
  )

report(beer_fit)
1
trend() adds a linear time index; season() adds quarterly dummy variables.
Series: Beer 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-42.9029  -7.5995  -0.4594   7.9908  21.7895 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   441.80044    3.73353 118.333  < 2e-16 ***
trend()        -0.34027    0.06657  -5.111 2.73e-06 ***
season()year2 -34.65973    3.96832  -8.734 9.10e-13 ***
season()year3 -17.82164    4.02249  -4.430 3.45e-05 ***
season()year4  72.79641    4.02305  18.095  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16

7.1.1 What the dummies actually look like

Seasonal dummies are created automatically by season(), but it helps to see them explicitly:

Code
beer |>
  mutate(
    t   = row_number(),
    Q2  = if_else(quarter(Quarter) == 2, 1L, 0L),
    Q3  = if_else(quarter(Quarter) == 3, 1L, 0L),
    Q4  = if_else(quarter(Quarter) == 4, 1L, 0L)
  ) |>
  head(8)

The dummy variable trap: if you create dummies for all m seasons and include an intercept, you have perfect multicollinearity. Always use m - 1 dummies. TSLM() handles this automatically.

Code
beer_fit |>
  gg_tsresiduals()

There is a notable outlier in Q2 1994. We can handle it with an intervention variable.

7.2 Intervention Variables (Dummies)

Dummy variables are not limited to capturing seasonality. They can capture specific events:

  • Spike variable: a 1 for a single anomalous period, 0 everywhere else. Captures a one-off shock.
  • Level shift variable: 0 before an event, 1 from the event onward. Captures a permanent change in level.
  • Ramp variable: 0 before an event, then increasing linearly. Captures a gradual structural shift.
Code
beer_interv <- beer |>
  mutate(
1    spike_Q2_1994 = if_else(Quarter == yearquarter("1994 Q2"), 1L, 0L),
2    level_2000    = if_else(year(Quarter) >= 2000, 1L, 0L)
  )

beer_fit_interv <- beer_interv |>
  model(
    TSLM(Beer ~ trend() + season() + spike_Q2_1994 + level_2000)
  )

report(beer_fit_interv)
1
Spike: captures the single anomalous quarter.
2
Level shift: allows a different intercept from year 2000 onward.
Series: Beer 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-42.7196  -6.9594  -0.7748   8.0050  21.5292 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   442.6530     3.8050 116.336  < 2e-16 ***
trend()        -0.3730     0.1285  -2.902  0.00501 ** 
season()year2 -33.3328     3.9745  -8.387 4.84e-12 ***
season()year3 -17.8072     3.9716  -4.484 2.95e-05 ***
season()year4  72.8436     3.9773  18.315  < 2e-16 ***
spike_Q2_1994 -24.5904    12.5547  -1.959  0.05432 .  
level_2000      0.6182     5.5264   0.112  0.91127    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.07 on 67 degrees of freedom
Multiple R-squared: 0.9284, Adjusted R-squared: 0.922
F-statistic: 144.9 on 6 and 67 DF, p-value: < 2.22e-16

Intervention variables in TSLM() are the same mechanism used in Module 4 for outlier handling — they give the model a way to account for known anomalies without distorting the other coefficient estimates.

8 Nonlinear Regression

We established earlier that linear in the parameters does not mean a straight line. Three practically useful forms:

8.1 Log Transformations

  • Log-log (constant elasticity): \log y_t = \beta_0 + \beta_1 \log x_t + \varepsilon_t. Coefficients are elasticities: a 1% change in x corresponds to a \beta_1% change in y.
  • Log-lin (exponential growth): \log y_t = \beta_0 + \beta_1 t + \varepsilon_t. Trend coefficient is the constant growth rate.
  • Lin-log: y_t = \beta_0 + \beta_1 \log x_t + \varepsilon_t. A 1% change in x changes y by \beta_1 / 100 units.
Back to top