darts icon indicating copy to clipboard operation
darts copied to clipboard

[Question] Best way to convert stock price data as Darts TimeSeries with non-conventional frequency

Open sophia-kwon opened this issue 1 year ago • 4 comments

I have a stock time series dataframe that looks like: image

When I tried to convert to time series: series = TimeSeries.from_dataframe(stock_normalized_df)

I get the following error: ERROR:darts.timeseries:ValueError: The time index of the provided DataArray is missing the freq attribute, and the frequency could not be directly inferred. This probably comes from inconsistent date frequencies with missing dates. If you know the actual frequency, try setting fill_missing_dates=True, freq=actual_frequency. If not, try setting fill_missing_dates=True, freq=None to see if a frequency can be inferred.

As you know, stock trading on an exchange has its own frequency that's not listed in https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

I tried to use timeSeries with rangeIndex, but the tft model I use complains that it only accepts time series with datetimeIndex.

Any suggestion on how I can use tft model with timeSeries? I can make it work by setting frequency to something like 'B' or 'D' and then fill missing dates with values, but obviously it doesn't make sense because we are dealing with stock data that only happen on market trading days.

sophia-kwon avatar Jan 02 '24 05:01 sophia-kwon

@sophia-kwon I ran into this issue a few days ago and was not able to find a good answer specific to darts. Here is the solution that I came up with that hopefully helps: https://github.com/ivelin/canswim/blob/main/prepare_data.ipynb

If you find a better way to handle this, please share.

ivelin avatar Jan 04 '24 00:01 ivelin

Hi @sophia-kwon,

If the frequency of your series is irregular and "non-predictable", your issue is related to #1571. This kind of data is not supported yet by Darts (an ugly workaround would be to use a RangeIndex as you mentioned but then, some information such as the distance between the values will be lost).

If the "business day frequency" is not suitable, you can define your own custom calendar and use it as frequency for the TimeSeries (an example can be found in #1650). You could also fill the missing values with zeroes?

madtoinou avatar Jan 04 '24 11:01 madtoinou

Hi @sophia-kwon there is another library Pytorch Forecasting, which works on non-periodic data, as you mentioned above. Here is the link for TFT tutorial- https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html

AjinkyaBankar avatar Jan 16 '24 22:01 AjinkyaBankar

@AjinkyaBankar : using a RangeIndex as time index in Darts seems to be equivalent to the way Pytorch Forecasting encode the time index in the TimeSeriesDataSet.

@sophia-kwon : Darts TFTModel also works with TimeSeries indexed with RangeIndex (can you check the error message you mentioned in your first message?):

from darts import TimeSeries
from darts.models import TFTModel
import numpy as np

# RangeIndex time index
ts = TimeSeries.from_values(np.random.randint(0,10,100))
fc = TimeSeries.from_values(np.random.randint(0,10,101))

model = TFTModel(input_chunk_length=4, output_chunk_length=1)
model.fit(ts[:80], future_covariates=fc)
model.predict(n=1)

madtoinou avatar Jan 17 '24 08:01 madtoinou