# Introduction to stationarity (R)

## Time Series¶

A time series is a series of data points captured in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. This post is the first in a series of blogs on time series methods and forecasting.

In this blog, we will discuss stationarity, random walk, deterministic drift and other vocabulary which form as foundation to time series:

## Stochastic processes¶

A random or stochastic process is a collection of random variables ordered in time. It is denoted as $Y_t$. For example, in-time of an employee is a stochastic process. How is in-time a stochastic process? Consider the in-time on a particular day is 9:00 AM. In theory, the in-time could be any particular value which depends on many factors like traffic, work load, weather etc. The figure 9:00 AM is a particular realization of many such possibilities. Therefore, we can say that in-time is a stochastic process whereas the actual values observed are a particular realization (sample) of the process.

## Stationary Processes¶

A stochastic process is said to be stationary if the following conditions are met:
1. Mean is constant over time
2. Variance is constant over time
3. Value of the co-variance between two time periods depends only on the distance or gap or lag between the two time periods and not the actual time at which the co variance is computed

This type of process is also called weakly stationary, or co variance stationary, or second-order stationary or wide sense stationary process.

Written mathematically, the conditions are: $$Mean: E(Y_t) = \mu$$ $$Variance: var(Y_t) = E(Y_t-\mu)^2 = \sigma^2$$ $$Covariance: \gamma_k = E[(Y_y - \mu)(Y_{t+k} - \mu)]$$

### Purely random or white noise process¶

A stochastic process is purely random if it has zero mean, constant variance, and is serially uncorrelated. An example of white noise is the error term in a linear regression which has zero mean, constant standard deviation and no auto-correlation.

### Simulation¶

For simulating a stationary process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days and 5 realizations is shown:

Samples of Stationary process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-12-28 0.3409607 0.5713826 0.2313986 0.6050719 0.5335372
2 2021-12-29 0.5554507 0.5244803 0.4288635 0.9073932 0.6350137
3 2021-12-30 0.1281935 0.1139629 0.2330727 0.8417148 0.8781020
10 2022-01-06 0.1901487 0.7607555 0.5620072 0.2611821 0.4575932
15 2022-01-11 0.8317412 0.6043582 0.0995929 0.9609510 0.2208680
30 2022-01-26 0.3612965 0.5961108 0.5965198 0.3048035 0.7668487

The mean, variance and co-variance between the samples (realizations) across are as follows: For a stationary process, the mean, variance and co variance are constant.

## Non-stationary Processes¶

If a time series is not stationary, it is called a non-stationary time series. In other words, a non-stationary time series will have a time-varying mean or a time-varying variance or both. Random walk, random walk with drift etc are examples of non-stationary processes.

### Random walk¶

Suppose $\epsilon_t$ is a white noise error term with mean 0 and variance $σ_2$. Then the series $Y_t$ is said to be a random walk if $$Y_t = Y_{t−1} + \epsilon_t$$ In the random walk model, the value of Y at time t is equal to its value at time (t − 1) plus a random shock.
For a random walk, $$Y_1 = Y_0 + \epsilon_1$$ $$Y_2 = Y_1 + \epsilon_2 = Y_0 + \epsilon_1 + \epsilon_2$$ $$Y_3 = Y_2 + \epsilon_3 = Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3$$ and so on.. In general we could write
$$Y_t = Y_0 + \sum \epsilon_t$$ As $$E(Y_t) = E(Y_0 + \sum \epsilon_t) = Y_0$$ $$var(Y_t) = t\times \sigma^2$$
Although the mean is constant with time, the variance is proportional to time.

For simulating a random walk process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days of 5 realizations (samples) is shown:

Samples of Random walk process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-12-28 4.000000 4.0000000 4.000000 4.000000 4.000000
2 2021-12-29 3.215170 4.9727559 4.981838 2.677480 4.209128
3 2021-12-30 2.400451 4.2477385 6.266374 3.249609 5.545876
10 2022-01-06 2.510370 4.1251187 8.500313 4.559066 7.634846
15 2022-01-11 6.286410 5.3430478 9.441353 2.147137 7.098887
30 2022-01-26 2.985008 0.2757552 5.219005 3.402089 4.125985

The mean, variance and covariances between the samples (realizations) across time would look like follows: From the above plot, the mean of Y is equal to its initial, or starting value, which is constant, but as t increases, its variance increases indefinitely, thus violating a condition of stationarity.

A random walk process is also called as a unit root process.

### Random walk with drift¶

If the random walk model predicts that the value at time t will equal the last period's value plus a constant, or drift ($\delta$), and a white noise term ($ε_t$), then the process is random walk with a drift.
$$Y_t = \delta + Y_{t−1} + \epsilon_t$$ The mean $$E(Y_t) = E(Y_0 + \sum \epsilon_t + \delta) = Y_0 + t\times\delta$$ so mean is dependent on time
and the variance $$var(Y_t) = t\times \sigma^2$$ is also dependent on time. As random walk with drift violates the conditions of stationary process, it is a non-stationary process.

Samples of Random walk with drift process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-12-28 4.000000 4.000000 4.000000 4.000000 4.000000
2 2021-12-29 6.707957 4.783724 5.082320 4.050322 6.047140
3 2021-12-30 6.154817 6.034937 6.593877 5.690097 6.443667
10 2022-01-06 3.092089 13.488318 13.143434 11.613472 8.216818
15 2022-01-11 4.827608 16.137101 12.706459 14.614712 12.535962
30 2022-01-26 8.567962 19.017960 20.586592 19.409629 14.157457 The mean, variance and the co-variance are all dependent on time.

### Unit root stochastic process¶

Unit root stochastic process is another name for Random walk process. A random walk process can be written as $$Y_t = \rho \times Y_{t−1} + \epsilon_t$$ Where $\rho = 1$. If $|\rho| < 1$ then the process represents Markov first order auto regressive model which is stationary. Only for $\rho = 1$ we get non-stationary. The distribution of mean, variance and co-variance for $\rho =0.5$ is ### Deterministic trend process¶

In the above random walk and random walk with drift, the trend component is stochastic in nature. If instead the trend is deterministic in nature, it will follow a deterministic trend process. $$Y_t = β_1 + β_2\times t + \epsilon_t$$ In a deterministic trend process, the mean is $β_1 + β_2\times t$ which is proportional with time, but the variance is constant. This type of process is also called as trend seasonality as subtracting mean of $Y_t$ from $Y_t$ will give us a stationary process. This procedure is called de-trending.

Samples of Deterministic trend process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-12-28 0.2435772 0.266316 1.634834 1.501271 -0.2332093
2 2021-12-29 1.7185437 1.974812 1.128986 2.605209 1.0183324
3 2021-12-30 3.0196971 2.321355 3.529886 3.100916 3.2666808
10 2022-01-06 11.8821817 9.759775 11.575552 9.727393 9.3407779
15 2022-01-11 13.3588365 15.525071 15.037742 15.931198 14.2090916
30 2022-01-26 30.2218724 30.342918 30.405570 29.090780 29.6063424 A combination of deterministic and stochastic trend could also exist in a process.

## Comparison.¶

A comparison of all the processes is shown below: 1. Basic Econometrics - Damodar N Gujarati (textbook for reference)
2. Business Analytics: The Science of Data-Driven Decision Making - Dinesh Kumar (textbook for reference)
3. Customer Analytics at Flipkart.com - Naveen Bhansali (case study in Harvard Business Review)