RStudio, R, and Time Series

pbenavides

Packages

The tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
  • tidyverse is a meta-package that loads the core packages of the tidyverse.

The tidyverts

library(fpp3)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
✔ tsibble     1.1.6     ✔ feasts      0.4.1
✔ tsibbledata 0.4.1     ✔ fable       0.4.1
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
  • fpp3 is also a meta-package that load the tidyverts ecosystem for time series analysis and forecasting.

Time Series

tsibble objects

Let’s take a look at tourism in Australia:

tourism
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# ℹ 24,310 more rows
key_vars(tourism)
[1] "Region"  "State"   "Purpose"
key_data(tourism)

Australian States

distinct(tourism, State)

Which regions are located in Tasmania?

distinct(filter(tourism, State == "Tasmania"),Region)

Data Transformation: Average trips

To get the average trips by purpose, we need to do the following:

  1. Filter the original tsibble to get only the data from East Coast, Tasmania.
  2. Convert the data to a tibble.
  3. Group by purpose.
  4. Summarise by getting the mean of the trips.

With traditional code, this would look something like:

summarise(group_by(as_tibble(filter(tourism, State == "Tasmania", 
                                    Region == "East Coast")), Purpose),
          mean_trips = mean(Trips))

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter() 

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 
  as_tibble() |>

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 
  as_tibble() |>                    
  group_by() 

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 
  as_tibble() |>                    
  group_by(Purpose) |>

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 
  as_tibble() |>                    
  group_by(Purpose) |>              
  summarise(mean_trips = )

Using the native pipe operator; |>, we can improve the same code:

tourism |>                          
  filter(State == "Tasmania",       
         Region == "East Coast") |> 
  as_tibble() |>                    
  group_by(Purpose) |>              
  summarise(mean_trips = mean(Trips))

Using the native pipe operator; |>, we can improve the same code:

tourism |>
  filter(State == "Tasmania",
         Region == "East Coast") |>
  as_tibble() |>
  group_by(Purpose) |>
  summarise(mean_trips = mean(Trips))
1
Take the tsibble tourism, then
2
filter by State and Region, then
3
convert to a tibble, then
4
group the tibble by purpose, then
5
summarise by taking the mean trips

TS Visualization

Plotting tourism across time

tourism

Plotting tourism across time

tourism |> 
  filter(State == "Tasmania",
         Region == "East Coast")               

Plotting tourism across time

tourism |> 
  filter(State == "Tasmania",
         Region == "East Coast") |> 
  autoplot(Trips)                

Plotting tourism across time

tourism |> 
  filter(State == "Tasmania",
         Region == "East Coast") |> 
  autoplot(Trips) +
  facet_wrap(vars(Purpose), scale = "free_y")                

Plotting tourism across time

tourism |> 
  filter(State == "Tasmania",
         Region == "East Coast") |> 
  autoplot(Trips) +
  facet_wrap(vars(Purpose), scale = "free_y") +
  theme(legend.position = "none")
1
autoplot() detects the data automatically and proposes a plot accordingly.
2
facet_wrap() Divides a plot into subplots (facets).
3
you can customize endless feautres using theme(). Here, we remove the legend, as it’s redudant.

Time plots

aus_production |> 
  autoplot(Gas)

aus_production |> 
  autoplot(Gas) +
  geom_point()

Seasonal Plots

aus_production |> 
  gg_season(Gas)

Removing the trend from the data:

Seasonal Subseries Plots

aus_production |> 
  gg_subseries(Gas)

gg_tsdisplay()

aus_production |> 
  gg_tsdisplay(Gas, plot_type = "season")

Exporting data to .csv

tourism |> 
  filter(State == "Tasmania",
         Region == "East Coast") |> 
  mutate(Quarter = as.Date(Quarter)) |> 
  write_csv("./datos/tasmania.csv")

Footnotes

  1. shown besides the tsibble dimension as [1Q]

  2. these are specified in the key argument. This tsibble contains