Stationarity

Stationarity

December 16, 2019 2 By Achyuthuni Harsha

Time Series

A time series is a series of data points captured in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. This post is the first in a series of blogs on time series methods and forecasting.

In this blog, we will discuss about stationarity, random walk, deterministic drift and other vocabulary which form as foundation to time series:

Stochastic processes

A random or stochastic process is a collection of random variables ordered in time. It is denoted as \(Y_t\). For example, in-time of an employee is a stochastic process. How is in-time a stochastic process? Consider the in-time on a particular day is 9:00 AM. In theory, the in-time could be any particular value which depends on many factors like traffic, work load, weather etc. The figure 9:00 AM is a particular realization of many such possibilities. Therefore we can say that in-time is a stochastic process where as the actual values observed are a particular realization (sample) of the process.

Stationary Processes

A stochastic process is said to be stationary if the following conditions are met:
1. Mean is constant over time
2. Variance is constant over time
3. Value of the co-variance between two time periods depends only on the distance or gap or lag between the two time periods and not the actual time at which the co variance is computed

This type of process is also called weakly stationary, or co variance stationary, or second-order stationary or wide sense stationary process.

Written mathematically, the conditions are: \[ Mean: E(Y_t) = \mu \] \[ Variance: var(Y_t) = E(Y_t-\mu)^2 = \sigma^2 \] \[ Covariance: \gamma_k = E[(Y_y - \mu)(Y_{t+k} - \mu)] \]

Purely random or white noise process

A stochastic process is purely random if it has zero mean, constant variance, and is serially uncorrelated. An example of white noise is the error term in a linear regression which has zero mean, constant standard deviation and no auto-correlation.

Simulation

For simulating a stationary process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days and 5 realizations is shown:
Samples of Stationary process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2019-12-16 0.7356201 0.2374115 0.0360584 0.8730372 0.5718014
2 2019-12-17 0.1441992 0.5452946 0.6921414 0.7099068 0.1587868
3 2019-12-18 0.3230618 0.1497708 0.3391369 0.0973547 0.6085889
10 2019-12-25 0.1017506 0.4812825 0.7688191 0.1277465 0.1499435
15 2019-12-30 0.1308073 0.2781965 0.1058099 0.2748190 0.7266108
30 2020-01-14 0.2226795 0.6059500 0.2601266 0.6362089 0.4759561

The mean, variance and co-variance between the samples (realizations) across are as follows:

For a stationary process, the mean, variance and co variance are constant.

Non-stationary Processes

If a time series is not stationary, it is called a non-stationary time series. In other words, a non-stationary time series will have a time-varying mean or a time-varying variance or both. Random walk, random walk with drift etc are examples of non-stationary processes.

Random walk

Suppose \(\epsilon_t\) is a white noise error term with mean 0 and variance \(σ_2\). Then the series \(Y_t\) is said to be a random walk if \[ Y_t = Y_{t−1} + \epsilon_t \] In the random walk model, the value of Y at time t is equal to its value at time (t − 1) plus a random shock.
For a random walk, \[ Y_1 = Y_0 + \epsilon_1 \] \[ Y_2 = Y_1 + \epsilon_2 = Y_0 + \epsilon_1 + \epsilon_2 \] \[ Y_3 = Y_2 + \epsilon_3 = Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 \] and so on.. In general we could write
\[ Y_t = Y_0 + \sum \epsilon_t \] As \[ E(Y_t) = E(Y_0 + \sum \epsilon_t) = Y_0 \] \[ var(Y_t) = t\times \sigma^2 \]
Although the mean is constant with time, the variance is proportional to time.

For simulating a random walk process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days of 5 realizations (samples) is shown:
Samples of Random walk process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2019-12-16 4.000000 4.000000 4.000000 4.000000 4.000000
2 2019-12-17 3.882959 3.116363 4.015556 2.224053 3.472241
3 2019-12-18 3.586484 3.178970 5.510334 1.651408 2.836190
10 2019-12-25 3.423350 5.718359 5.279429 4.355010 4.016813
15 2019-12-30 4.152690 5.739801 7.333779 2.978225 2.694669
30 2020-01-14 2.958519 4.031016 9.864177 3.122688 -4.095760

The mean, variance and covariances between the samples (realizations) across time would look like follows:

From the above plot, the mean of Y is equal to its initial, or starting value, which is constant, but as t increases, its variance increases indefinitely, thus violating a condition of stationarity.

A random walk process is also called as a unit root process.

Random walk with drift

If the random walk model predicts that the value at time t will equal the last period’s value plus a constant, or drift (\(\delta\)), and a white noise term (\(ε_t\)), then the process is random walk with a drift.
\[ Y_t = \delta + Y_{t−1} + \epsilon_t \] The mean \[ E(Y_t) = E(Y_0 + \sum \epsilon_t + \delta) = Y_0 + t\times\delta \] so mean is dependent on time
and the variance \[ var(Y_t) = t\times \sigma^2 \] is also dependent on time. As random walk with drift violates the conditions of stationary process, it is a non-stationary process.
Samples of Random walk with drift process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2019-12-16 4.000000 4.000000 4.000000 4.000000 4.000000
2 2019-12-17 4.916676 3.445681 3.732304 3.489562 4.687424
3 2019-12-18 4.406755 4.341785 5.879230 3.927224 5.858994
10 2019-12-25 5.271076 3.892017 5.619834 8.790161 9.149255
15 2019-12-30 5.264659 4.524672 6.942081 10.377750 12.176134
30 2020-01-14 11.929499 15.110744 17.040556 20.335734 20.527694

The mean, variance and the co-variance are all dependent on time.

Unit root stochastic process

Unit root stochastic process is another name for Random walk process. A random walk process can be written as \[ Y_t = \rho \times Y_{t−1} + \epsilon_t \] Where \(\rho = 1\). If \(|\rho| < 1\) then the process represents Markov first order auto regressive model which is stationary. Only for \(\rho = 1\) we get non-stationary. The distribution of mean, variance and co-variance for \(\rho =0.5\) is

Deterministic trend process

In the above random walk and random walk with drift, the trend component is stochastic in nature. If instead the trend is deterministic in nature, it will follow a deterministic trend process. \[ Y_t = β_1 + β_2\times t + \epsilon_t\] In a deterministic trend process, the mean is \(β_1 + β_2\times t\) which is proportional with time but the variance is constant. This type of process is also called as trend seasonality as subtracting mean of \(Y_t\) from \(Y_t\) will give us a stationary process. This procedure is called de-trending.
Samples of Deterministic trend process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2019-12-16 0.7844548 1.351543 1.959021 -0.5513578 1.592412
2 2019-12-17 3.5492975 1.517891 1.491118 1.1883721 1.076729
3 2019-12-18 3.9577508 2.623527 4.115253 2.0636018 3.616692
10 2019-12-25 11.7488688 9.578051 10.242385 8.4459909 11.751336
15 2019-12-30 13.6550047 15.117834 15.668208 14.9922993 18.064545
30 2020-01-14 29.6632534 29.150716 30.654134 30.3425522 28.767587

A combination of deterministic and stochastic trend could also exist in a process.

Comparison.

A comparison of all the processes is shown below:

References

  1. Basic Ecnometrics - Damodar N Gujarati (textbook for reference)
  2. Business Analytics: The Science of Data-Driven Decision Making - Dinesh Kumar (textbook for reference)
  3. Customer Analytics at Flipkart.Com - Naveen Bhansali (case study in Harvard business review)