• 1 Introduction
  • 2 Goals
  • 3 Packages
  • 4 The tsibble data format
    • 4.1 Case Study -1: Walmart Sales Dataset from timetk
    • 4.2 Conclusion
    • 4.3 References

1 Introduction

2 Goals

At the end of this Lab session, we will be able to:

  • Understand time series data and data structures
  • Create Time Series Visualizations
  • Perform modelling tasks on Time Series Data
  • Calculate and depict Averages, Seasons, Trends in Time Series
  • Perform Forecasting with Time Series

3 Packages

4 The tsibble data format

TBW. To be written up.

4.1 Case Study -1: Walmart Sales Dataset from timetk

Let us inspect what datasets are available in the package timetk. Type data(package = “timetk”) in your Console to see what datasets are available.

Let us choose the Walmart Sales dataset. See here for more details: Walmart Recruiting - Store Sales Forecasting |Kaggle

ABCDEFGHIJ0123456789
id
<fct>
Store
<dbl>
Dept
<dbl>
Date
<date>
Weekly_Sales
<dbl>
IsHoliday
<lgl>
Type
<chr>
Size
<dbl>
Temperature
<dbl>
1_1112010-02-0524924.50FALSEA15131542.31
1_1112010-02-1246039.49TRUEA15131538.51
1_1112010-02-1941595.55FALSEA15131539.93
1_1112010-02-2619403.54FALSEA15131546.63
1_1112010-03-0521827.90FALSEA15131546.50
1_1112010-03-1221043.39FALSEA15131557.79
1_1112010-03-1922136.64FALSEA15131554.58
1_1112010-03-2626229.21FALSEA15131551.45
1_1112010-04-0257258.43FALSEA15131562.27
1_1112010-04-0942960.91FALSEA15131565.86
## Rows: 1,001
## Columns: 17
## $ id           <fct> 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_1, 1_…
## $ Store        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Dept         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Date         <date> 2010-02-05, 2010-02-12, 2010-02-19, 2010-02-26, 2010-03-…
## $ Weekly_Sales <dbl> 24924.50, 46039.49, 41595.55, 19403.54, 21827.90, 21043.3…
## $ IsHoliday    <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ Type         <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A…
## $ Size         <dbl> 151315, 151315, 151315, 151315, 151315, 151315, 151315, 1…
## $ Temperature  <dbl> 42.31, 38.51, 39.93, 46.63, 46.50, 57.79, 54.58, 51.45, 6…
## $ Fuel_Price   <dbl> 2.572, 2.548, 2.514, 2.561, 2.625, 2.667, 2.720, 2.732, 2…
## $ MarkDown1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ MarkDown2    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ MarkDown3    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ MarkDown4    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ MarkDown5    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ CPI          <dbl> 211.0964, 211.2422, 211.2891, 211.3196, 211.3501, 211.380…
## $ Unemployment <dbl> 8.106, 8.106, 8.106, 8.106, 8.106, 8.106, 8.106, 8.106, 7…

The data is described as:

A tibble: 9,743 x 3

  • id Factor. Unique series identifier (4 total)
  • Store Numeric. Store ID.
  • Dept Numeric. Department ID.
  • Date Date. Weekly timestamp.
  • Weekly_Sales Numeric. Sales for the given department in the given store.
  • IsHoliday Logical. Whether the week is a “special” holiday for the store.
  • Type Character. Type identifier of the store.
  • Size Numeric. Store square-footage
  • Temperature Numeric. Average temperature in the region.
  • Fuel_Price Numeric. Cost of fuel in the region.
  • MarkDown1, MarkDown2, MarkDown3, MarkDown4, MarkDown5 Numeric. Anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.
  • CPI Numeric. The consumer price index.
  • Unemployment Numeric. The unemployment rate in the region.

NOTE: 1. This is still a data.frame, with a time-oriented variable of course, and not yet a time-series object. Note that this data frame has the YMD columns repeated for each Dept. 2. The Date column has repeated entries for each Dept! To deal with this repetition, we will always need to split the Weekly_Sales by the Dept column before we plot or analyze.

Since our sales are weekly, we will convert Date to yearweek format:

ABCDEFGHIJ0123456789
id
<fct>
Store
<dbl>
Dept
<dbl>
Date
<date>
Weekly_Sales
<dbl>
IsHoliday
<lgl>
Type
<chr>
Size
<dbl>
Temperature
<dbl>
1_1112010-02-0524924.50FALSEA15131542.31
1_1112010-02-1246039.49TRUEA15131538.51
1_1112010-02-1941595.55FALSEA15131539.93
1_1112010-02-2619403.54FALSEA15131546.63
1_1112010-03-0521827.90FALSEA15131546.50
1_1112010-03-1221043.39FALSEA15131557.79
1_1112010-03-1922136.64FALSEA15131554.58
1_1112010-03-2626229.21FALSEA15131551.45
1_1112010-04-0257258.43FALSEA15131562.27
1_1112010-04-0942960.91FALSEA15131565.86

4.1.1 Basic Time Series Plots

The easiest way is to use autoplot from the feasts package. You may need to specify the actual measured variable, if there is more than one numerical column:

The R package timetk gives us interactive plots that may be more evocative than the static plot above. The basic plot function with timetk is plot_time_series. There are arguments for the date variable, the value you want to plot, colours, groupings etc.

Let us explore this dataset using timetk, using our trusted method of asking Questions:

Q.1 How are the weekly sales different for each Department?

There are
ABCDEFGHIJ0123456789
n
<int>
7

number of Departments. So we should be fine plotting them and also facetting with them, as we will see in a bit:

Jul 2010Jan 2011Jul 2011Jan 2012Jul 2012020k40k60k80k100k120k140k
Legend13813389395Walmart Sales Data by Department

Q.2. What do the sales per Dept look like during the month of December (Christmas time) in 2012? Show the individual Depts as facets.

We can of course zoom into the interactive plot above, but if we were to plot it anyway:

30k40k32k34k36k38k40k70k80k90kDec 42011Dec 11Dec 18Dec 25100k110k120k10k12k34k36kDec 42011Dec 11Dec 18Dec 2560k70k80k90k
Legend13813389395Time Series Plot13813389395

Clearly the “unfortunate” Dept#13 has seen something of a Christmas drop in sales, as has Dept#38 ! The rest, all is well, it seems…

Too much noise? How about some averaging?

Q.3 How do we smooth out some of the variations in the time series to be able to understand it better?

Sometimes there is too much noise in the time series observations and we want to take what is called a rolling average. For this we will use the function timetk::slidify to create an averaging function of our choice, and then apply it to the time series using regular dplyr::mutate. Let’s take the average of Sales for each month in each Department. Our function will be named “rolling_avg_month”:

OK, slidify creates a function! Let’s apply it to the Walmart Sales time series…

Jul 2010Jan 2011Jul 2011Jan 2012Jul 201220k40k60k80k100k120k140k
Legend13813389395Time Series Plot

Graphs are smoother now. Need to check whether the averaging was done on a per-Dept basis…should we have had a group_by(Dept) before the averaging, and ungroup() before plotting? Try it !!

4.1.3 Case Study-2: Dataset from nycflights13

Let us try the flights dataset from the package nycflights13. Try data(package = "nycflights13") in your Console.

We have the following datasets in nycflights13:

  • airlines Airline names.
  • airports Airport metadata
  • flights Flights data
  • planes Plane metadata.
  • weather Hourly weather data

Let us analyze the flights data:

## Rows: 336,776
## Columns: 19
## $ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2…
## $ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, …
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, …
## $ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1…
## $ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,…
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,…
## $ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1…
## $ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", "…
## $ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4…
## $ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N394…
## $ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA",…
## $ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD",…
## $ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1…
## $ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, …
## $ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6…
## $ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0…
## $ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…

We have time-related columns; Apart from year, month, day we have time_hour; and time-event numerical data such as arr_delay (arrival delay) and dep_delay (departure delay). We also have categorical data such as carrier, origin, dest, flight and tailnum of the aircraft. It is also a large dataset containing 330K entries. Enough to play with!!

Let us replace the NAs in arr_delay and dep_delay with zeroes for now, and convert it into a time-series object with tsibble:

ABCDEFGHIJ0123456789
time_hour
<dttm>
arr_delay
<dbl>
dep_delay
<dbl>
carrier
<chr>
origin
<chr>
dest
<chr>
flight
<int>
tailnum
<chr>
2013-05-04 07:00:00-11-69EEWRATL3633N170PQ
2013-05-01 11:00:00-24-59EEWRATL4194N170PQ
2013-05-03 09:00:0015-69EEWRATL4362N170PQ
2013-05-02 09:00:00-5-69EEWRATL4362N232PQ
2013-12-12 17:00:0013109EEWRCVG3323N8506C
2013-12-05 17:00:00009EEWRCVG3323NA
2013-12-18 17:00:00-1319EEWRCVG3335N8412F
2013-12-11 17:00:00-7-89EEWRCVG3335N8505Q
2013-12-13 17:00:0030319EEWRCVG3335N8672A
2013-12-16 17:00:00-30-149EEWRCVG3335N8877A

Let us proceed with our questions:

Q.1. Plot the monthly average arrival delay by carrier

ABCDEFGHIJ0123456789
carrier
<chr>
month
<mth>
mean_arr_delay
<dbl>
9E2013 Jan9.60394151
9E2013 Feb7.61343386
9E2013 Mar1.88567916
9E2013 Apr5.06551952
9E2013 May9.85294118
9E2013 Jun19.73903967
9E2013 Jul21.29785810
9E2013 Aug5.01098901
9E2013 Sep-6.80909091
9E2013 Oct-1.31500299
01020−5051015−20020Jan 2013Apr 2013Jul 2013Oct 2013−50510−505010205101520Jan 2013Apr 2013Jul 2013Oct 2013−1001020−30−20−10010102030050100Jan 2013Apr 2013Jul 2013Oct 2013010200102010203040−50510Jan 2013Apr 2013Jul 2013Oct 20130102030
Average Monthly Arrival Delays by Carrier9EAAASB6DLEVF9FLHAMQOOUAUSVXWNYV

Q.2. Plot a candlestick chart for total flight delays by month for each carrier

010002000010002000Jan 2013Mar 2013May 2013Jul 2013Sep 2013Nov 2013050010001500
LegendEWRJFKLGATime Series PlotEWRJFKLGA

Q.2. Plot a heatmap chart for total flight delays by origin, aggregated by month

ABCDEFGHIJ0123456789
origin
<chr>
month
<mth>
mean_monthly_delay
<dbl>
EWR2013 Jan27.004852
EWR2013 Feb20.613814
EWR2013 Mar27.653647
EWR2013 Apr30.710949
EWR2013 May20.239992
EWR2013 Jun37.774251
EWR2013 Jul36.393317
EWR2013 Aug19.836181
EWR2013 Sep2.544921
EWR2013 Oct11.137272

4.2 Conclusion

We can plot most of the common Time Series Plots with the help of the tidyverts packages: ( tsibble, feast, fable and fabletools) , along with timetk and ggformula.

There are other plot packages to investigate, such as dygraphs.

Recall that we have used the tsibble format for the data. There are other formats such as ts, xts and others which are also meant for time series analysis. But for our present purposes, we are able to do things with the capabilities of timetk.

4.3 References

  1. Rob J Hyndman and George Athanasopoulos. Forecasting: Principles and Practice (3rd ed). Available online at https://otexts.com/fpp3/
---
title: "Time Series in R"
subtitle: ""
author: Arvind Venkatadri
affiliation: 
output:
  html_document:
    theme: flatly
    toc: TRUE
    toc_float: TRUE
    toc_depth: 2
    number_sections: TRUE
    code_folding: show
    code_download: TRUE
    df_print: paged
editor_options: 
  markdown: 
    wrap: 72
abstract: Part of my online course `R for Artists and Designers` to teach R using Metaphors and Code.
---

# Introduction

# Goals

At the end of this Lab session, we will be able to:

-   Understand time series data and data structures
-   Create Time Series Visualizations
-   Perform modelling tasks on Time Series Data
-   Calculate and depict Averages, Seasons, Trends in Time Series
-   Perform Forecasting with Time Series

# Packages

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,warning = FALSE)
library(tidyverse)
library(lubridate) #Messing with Dates
library(tsibble) # Time-aware dataframes and tibbles
library(tsibbledata)
library(feasts) # visualising the data and extracting time series features.
library(fable) # forecasting methods for tsibble, such as ARIMA and ETS.
library(fabletools) #  modelling infrastructure to ease the programming with tsibble.
library(timetk) # Plotting and Analysing Time Series Data
library(ggformula)

######
library(prophet)
library(CausalImpact)

```

# The `tsibble` data format

TBW. To be written up.

## Case Study -1: Walmart Sales Dataset from timetk

Let us inspect what datasets are available in the package timetk. Type
data(package = "timetk") in your Console to see what datasets are
available.

Let us choose the Walmart Sales dataset. See here for more details:
Walmart Recruiting - Store Sales Forecasting \|Kaggle

```{r}

data("walmart_sales_weekly")
walmart_sales_weekly
glimpse(walmart_sales_weekly)

# Try this in your Console
# help("walmart_sales_weekly")

```

The data is described as:

A tibble: 9,743 x 3

-   id Factor. Unique series identifier (4 total)
-   Store Numeric. Store ID.
-   Dept Numeric. Department ID.
-   Date Date. Weekly timestamp.
-   Weekly_Sales Numeric. Sales for the given department in the given
    store.
-   IsHoliday Logical. Whether the week is a "special" holiday for the
    store.
-   Type Character. Type identifier of the store.
-   Size Numeric. Store square-footage
-   Temperature Numeric. Average temperature in the region.
-   Fuel_Price Numeric. Cost of fuel in the region.
-   MarkDown1, MarkDown2, MarkDown3, MarkDown4, MarkDown5 Numeric.
    Anonymized data related to promotional markdowns that Walmart is
    running. MarkDown data is only available after Nov 2011, and is not
    available for all stores all the time. Any missing value is marked
    with an NA.
-   CPI Numeric. The consumer price index.
-   Unemployment Numeric. The unemployment rate in the region.

NOTE: 1. This is still a data.frame, with a time-oriented variable of
course, and not yet a time-series object. Note that this data frame has
the YMD columns repeated for each Dept. 2. The Date column has repeated
entries for each Dept! To deal with this repetition, we will always need
to split the Weekly_Sales by the Dept column before we plot or analyze.

Since our sales are weekly, we will convert Date to yearweek format:

```{r}
walmart_time <- walmart_sales_weekly %>% 
  
  as_tsibble(index = Date, # Time Variable 
             
  key = Dept ) 
# Identifies unique "subject" who are measures 
# All other variables such as Weekly_sales become "measured variable" 
# Each observation should be uniquely identified by index and key
  
walmart_time
```

### Basic Time Series Plots

The easiest way is to use autoplot from the feasts package. You may need
to specify the actual measured variable, if there is more than one
numerical column:

```{r feasts-basic-plot}

autoplot(walmart_time,.vars = Weekly_Sales)
```

The R package `timetk` gives us interactive plots that may be more
evocative than the static plot above. The basic plot function with
`timetk` is `plot_time_series`. There are arguments for the date
variable, the value you want to plot, colours, groupings etc.

Let us explore this dataset using `timetk`, using our trusted method of
asking Questions:

**Q.1 How are the weekly sales different for each Department?**

There are `r walmart_sales_weekly %>% distinct(Dept) %>% count()` number
of Departments. So we should be fine plotting them and also facetting
with them, as we will see in a bit:

```{r}
walmart_time %>% 
  timetk::plot_time_series(Date, Weekly_Sales, 
                           .color_var = Dept, 
                           .legend_show = TRUE, 
                           .title = "Walmart Sales Data by Department",
                           .smooth = FALSE)
```

**Q.2. What do the sales per Dept look like during the month of December
(Christmas time) in 2012? Show the individual Depts as facets.**

We can of course zoom into the interactive plot above, but if we were to
plot it anyway:

```{r filtering-by-time}

# Only include rows from 1 to December 31, 2011
# Data goes only up to Oct 2012
walmart_time %>%
  
  # Each side of the time_formula below is specified as the character
  # 'YYYY-MM-DD HH:MM:SS',
  
  timetk::filter_by_time(.date_var = Date,
                         .start_date = "2011-12-01",
                         .end_date = "2011-12-31") %>%
  
  plot_time_series(
    .date_var = Date,
    .value = Weekly_Sales,
    .color_var = Dept,
    .facet_vars = Dept,
    .facet_ncol = 2,
    .smooth = FALSE
  ) # Only 4 points per graph
```

Clearly the "unfortunate" Dept#13 has seen something of a Christmas drop
in sales, as has Dept#38 ! The rest, all is well, it seems...

Too much noise? How about some averaging?

**Q.3 How do we smooth out some of the variations in the time series to
be able to understand it better?**

Sometimes there is too much noise in the time series observations and we
want to take what is called a *rolling average*. For this we will use
the function `timetk::slidify` to create an averaging function of our
choice, and then apply it to the time series using regular
`dplyr::mutate`. Let's take the average of Sales for each month in each
Department. Our **function** will be named "rolling_avg_month":

```{r  averaging-function}
rolling_avg_month = slidify(.period = 4, # every 4 weeks 
                            .f = mean, # The function to average 
                            .align = "center", # Aligned with middle of month 
                            .partial = TRUE) # To catch any leftover half weeks rolling_avg_month
```

OK, slidify creates a function! Let's apply it to the Walmart Sales time
series...

```{r averaging}
walmart_time %>% # group_by(Dept) %>%
  mutate(avg_monthly_sales = rolling_avg_month(Weekly_Sales)) %>%
  # ungroup() %>%
  timetk::plot_time_series(Date,
                           avg_monthly_sales,
                           .color_var = Dept,
                           .smooth = FALSE)
```

Graphs are smoother now. Need to check whether the averaging was done on
a per-Dept basis...should we have had a `group_by(Dept)` before the
averaging, and `ungroup()` before plotting? Try it !!

### Decomposing Time Series: Trends, Seasonal Patterns, and Cycles

Each data point ($Y_t$) at time $t$ in a Time Series can be expressed as
either a sum or a product of 4 components, namely, Seasonality($S_t$),
Trend($T_t$), Cyclic, and Error($e_t$) (a.k.a White Noise).

-   Trend: pattern exists when there is a long-term increase or decrease
    in the data.
-   Seasonal: pattern exists when a series is influenced by seasonal
    factors (e.g., the quarter of the year, the month, or day of the
    week).
-   Cyclic: pattern exists when data exhibit rises and falls that are
    not of fixed period (duration usually of at least 2 years).
-   Error or Noise: Random component

Decomposing non-seasonal data means breaking it up into trend and
irregular components. To estimate the trend component of a non-seasonal
time series that can be described using an additive model, it is common
to use a smoothing method, such as calculating the simple moving average
of the time series.

`timetk` has the ability to achieve this: Let us plot the trend,
seasonal, cyclic and irregular aspects of Weekly_Sales for Dept 38:

```{r decompose-time-series}
walmart_time %>% filter(Dept == "38") %>% 
  timetk::plot_stl_diagnostics(.date_var = Date, 
                               .value = Weekly_Sales)

```

We can do this for all Dept using `fable` and `fabletools`:

```{r Decomposing_trends}
walmart_decomposed <- walmart_time %>%
  
  # If we want to filter, we do it here # filter(Dept == "38") %>% #
  fabletools::model(stl = STL(Weekly_Sales))

fabletools::components(walmart_decomposed)

autoplot(components((walmart_decomposed)))

```

### Case Study-2: Dataset from nycflights13

Let us try the flights dataset from the package `nycflights13`. Try
`data(package = "nycflights13")` in your Console.

We have the following datasets in `nycflights13`:

-   airlines Airline names.
-   airports Airport metadata
-   flights Flights data
-   planes Plane metadata.
-   weather Hourly weather data

Let us analyze the flights data:

```{r}
data("flights", package = "nycflights13")
glimpse(flights)
```

We have time-related columns; Apart from year, month, day we have
`time_hour`; and time-event numerical data such as `arr_delay` (arrival
delay) and `dep_delay` (departure delay). We also have categorical data
such as `carrier`, `origin`, `dest`, `flight` and `tailnum` of the
aircraft. It is also a large dataset containing 330K entries. Enough to
play with!!

Let us replace the NAs in `arr_delay` and `dep_delay` with zeroes for
now, and convert it into a time-series object with `tsibble`:

```{r}
flights_delay_ts <- flights %>%
  mutate(arr_delay = replace_na(arr_delay, 0),
         dep_delay = replace_na(dep_delay, 0)) %>%
  select(time_hour,
         arr_delay,
         dep_delay,
         carrier,
         origin,
         dest,
         flight,
         tailnum) %>%
  tsibble::as_tsibble(
    index = time_hour,
    # All the remaining identify unique entries
    # Along with index # Many of these variables are common
    # Need all to make unique entries!
    key = c(carrier, origin, dest, flight, tailnum),
    validate = TRUE
  ) # Making sure each entry is unique

flights_delay_ts
```

Let us proceed with our questions:

**Q.1. Plot the monthly average arrival delay by carrier**

```{r}
mean_arr_delays_by_carrier <- flights_delay_ts %>%
  group_by(carrier) %>%
  index_by(month = ~ yearmonth(.)) %>%
  
  # index_by uses (year, yearquarter, yearmonth, yearweek, as.Date)
  # to create a new column to show the time-grouping # year / quarter / month/ week, or day…
  # which IS different from traditional dplyr
  
  summarise(mean_arr_delay = mean(arr_delay, na.rm = TRUE))

mean_arr_delays_by_carrier

mean_arr_delays_by_carrier %>% 
  timetk::plot_time_series(
  .date_var = month,
  .value = mean_arr_delay,
  .facet_vars = carrier,
  .smooth = FALSE,
  # .smooth_degree = 1, 
  # keep .smooth off since it throws warnings if there are too few points
  # Like if we do quarterly or even yearly summaries
  # Use only for smaller values of .smooth_degree (0,1)
  #
  .facet_ncol = 4,
  .title = "Average Monthly Arrival Delays by Carrier"
)
```

**Q.2. Plot a candlestick chart for total flight delays by month for
each carrier**

```{r}
flights_delay_ts %>% 
  mutate(total_delay = arr_delay + dep_delay) %>% 
  timetk::plot_time_series_boxplot(
  .date_var = time_hour,
  .value = total_delay,
  .color_var = origin,
  .facet_vars = origin,
  .period = "month",
  # same warning again .smooth = FALSE 
  )
```

**Q.2. Plot a heatmap chart for total flight delays by origin,
aggregated by month**

```{r}
avg_delays_month <-
  flights_delay_ts %>% 
  group_by(origin) %>% 
  mutate(total_delay = arr_delay + dep_delay) %>% 
  index_by(month = ~ yearmonth(.)) %>% 
  
  # index_by uses (year, yearquarter, yearmonth, yearweek, as.Date) 
  # to create a new column to show the time-grouping 
  # year / quarter / month/ week, or day… 
  # which IS different from traditional dplyr 
  
  summarise(mean_monthly_delay = mean(total_delay, na.rm = TRUE))
  
avg_delays_month 
# three origins 12 months therefore 36 rows 
# # Tsibble index_by + summarise also gives us a month` column

ggformula::gf_tile(
  origin ~ month,
  fill = ~ mean_monthly_delay,
  color = "black",
  data = avg_delays_month,
  title = "Mean Flight Delays from NY Airports in 2013"
) %>%
  gf_theme(theme_classic()) %>%
  gf_theme(scale_fill_viridis_c(option = "A")) %>%
  
  # "magma" (or "A") inferno" (or "B") "plasma" (or "C")
  # "viridis" (or "D") "cividis" (or "E")
  # "rocket" (or "F") "mako" (or "G") "turbo" (or "H")
  
  gf_theme(scale_x_time(breaks = 1:12, labels = month.abb))

```

## Conclusion

We can plot most of the common Time Series Plots with the help of the
`tidyverts` packages: ( `tsibble`, `feast`, `fable` and `fabletools`) , along
with `timetk` and `ggformula`.

There are other plot packages to investigate, such as `dygraphs`.

Recall that we have used the `tsibble` format for the data. There are
other formats such as `ts`, `xts` and others which are also meant for
time series analysis. But for our present purposes, we are able to do
things with the capabilities of `timetk`.

## References

1.  Rob J Hyndman and George Athanasopoulos. *Forecasting: Principles
    and Practice (3rd ed)*. Available online at
    <https://otexts.com/fpp3/>
    
    

