Analyzing the NASDAQ with Gaussian Processes

A simple demonstration of a powerful tool.

Featured image

A while back I worked on a fun project using some of the analysis tools from my PhD to analyze stock prices. Originally we just posted it as a GitHub repo, but I thought it would be fun to also post it here. So here it is! You can find the original repo here.

Introduction

The Gaussian process is a widely used tool in data science. The Gaussian process model is typically used for data smoothing, interpolation, and regression. However, there are other applications of Gaussian processes that make it a very powerful model.

In this project, we show that a Gaussian process can be used to infer accelerations and decelerations in time series data. By analyzing accelerations, we are able to identify periods of time with higher and lower impact. We apply our model to the NASDAQ stock index to identify periods of high and low growth rates, then show that these periods correspond to significant macroeconomic events.

By demonstrating how Gaussian processes can be used to identify economic trends, we hope to highlight the model’s broader potential in analyzing complex time series data.

Methodology

Our goal is to demonstrate how to use the Guassian process to infer second derivatives of time series data, not necessarily to derive the equations. As such we will not go into the details of the derivation of the Gaussian process. However, we will provide a brief overview of the model.

The main assumption of a Gaussian process model is that if we evaluate a function, $F(t)$, at a collection of points, $t_1, t_2, \ldots, t_n$, then the values of the function at these points will be Gaussian distributed

\[\boldsymbol{f} = \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{K})\]

where $f_n=F(t_n)$, $\boldsymbol{\mu}$ is the mean vector (which we will set to 0), and $\boldsymbol{K}$ is the covariance matrix. The covariance matrix is defined as

\[K_{ij} = a^2\exp\left(-\frac{(t_i-t_j)^2}{2\ell^2}\right)\]

where $a$ is the amplitude of the function, and $\ell$ is the length scale. The amplitude determines the magnitude of the function. The length scale determines how quickly the function changes with respect to time.

In our case our function $F(t)$ tracks the value of NASDAQ stock index at time over time. But, we say “value” we don’t mean the actual stock price, but rather what it is intrinsically worth. We assume the stock price is randomly distributed around the value at each point in time

\[P(y_n) = \mathcal{N}(F(t_n), \sigma^2)\]

where $y_n$ is the stock price at time $t_n$, and $\sigma^2$ is the variance of the stock price.

Without getting into the details, which are beyond the scope of this write-up, we can use the stock prices at different times to infer the value of the stock at different times, ${t^}_1, {t^}_2, \ldots, {t^*}_n$. The equation for this is

\[\boldsymbol{f}^* = \boldsymbol{K}^* (\boldsymbol{K} + \sigma^2\boldsymbol{I})^{-1}\boldsymbol{y}\]

but this just gives the value of the stock, it doesn’t tell us how much it is changing. To get the acceleration, we need to take the second derivative of the function. The useful part of the Gaussian process is that we can take the second derivative of the function by taking the second derivative of the covariance matrix

\[\boldsymbol{f}^* = (\dfrac{d}{dt}\boldsymbol{K}^*) (\boldsymbol{K} + \sigma^2\boldsymbol{I})^{-1}\boldsymbol{y}\]

where

\[\dfrac{d}{dt}K_{ij} = a^2\exp\left(-\frac{(t_i-t_j)^2}{2\ell^2}\right)\left(\frac{(t_i-t_j)}{\ell^2}\right).\]

So all in all we have a very simple equation relating the stock prices to the acceleration of the stock prices.

Just a note: in the code we provide, the equations won’t match up exactly with what we have here since we use a slightly different model called a Structured Kernel Interpolation (SKI) model. If you want to know more feel free to check out our previous work on it here.

Results

We run our model on the NASDAQ stock index from 2019 to 2024. The results are shown in the figures below. We apply our model using two different length scales, $l=30$ days and $l=180$ days.

We first apply our model with $l=30$ days. GP_l=month As you can see, there are clear spikes of acceleration at major turns in the market. At this resolution, however, we see more rapid shifts in the market, so it may be better to choose a length scale that is longer than 30 days.

We then apply our model with $l=180$ days. GP We again see large spikes of acceleration at major turns in the market. However, we see fewer spikes, and the spikes are more pronounced. We will use our results from this model to identify significant macroeconomic events.

Interpretation

We observe the highest accelerations in the NASDAQ’s growth rate around the dates of April 17, 2020, May 5, 2021, and July 25, 2022. Typically, such accelerations follow a period of decline, indicating a local trough in stock prices. These dates coincide with significant macroeconomic events, which are outlined below:

April 17, 2020

May 5, 2021

July 25, 2022

In contrast, the highest deceleration in growth rate observed around these dates, August 20th, 2019, November 18th, 2021, and July 7, 2023, do not align with significant socio-economic events as distinctly as the acceleration dates, except for November 29th, 2020 and November 18th, 2021.

November 29, 2020

Market defusing: The NASDAQ at this point was decelerating, possibly as a correction from the rapid growth in the preceding months, as the market adjusted to the new normal of the pandemic. *Election: The US presidential election market a transition of power, which could have introduced uncertainty into the market, leading to a deceleration in growth rates.

November 18, 2021

Inflation Surge: Marking the fastest inflation rate since 1982, energy and food prices saw significant increases, while shelter costs had their highest rise since 2007. Job Growth Disappointment: Despite the sharp fall in the unemployment rate, the economy created far fewer jobs than expected, with notable declines in retail and government employment. Wage Growth: Worker wages continued their upward trend, rising both for the month and significantly over the year.

Conclusion

Here we have shown that Gaussian processes can be used to infer accelerations and decelerations in time series data. We applied our model to the NASDAQ stock index to identify periods of high and low growth rates, then showed that these periods correspond to significant macroeconomic events. By demonstrating how Gaussian processes can be used to identify economic trends, we hope to highlight the model’s broader potential in analyzing complex time series data.

Credits

This work was a joint project between Saivardhan Reddy Ainavolu and Shep Bryan. We hope you enjoyed our work and learned something new. If you have any questions or comments, feel free to reach out to us.