darts
darts copied to clipboard
[Question] How to perform temporal embedding in DARTS?
I want to understand how can I use date information as features for my machine learning model on DARTS. I want to create new information from dates and use them as columns on a regression-based forecaster. I'm quite confused by the covariates terminology and it isn't really clear what is happening under the hood on darts when using add_encoders
.
In my case, I do not have any exogenous variables (past/future covariates), I just want to try to capture seasonality using temporal embedding features, like sine and cosine, for training and also for inference. For example, let's say that I have the following data:
Date | Sales |
---|---|
2024-04-01 | 100 |
2024-04-08 | 150 |
2024-04-15 | 200 |
2024-04-22 | 180 |
2024-04-29 | 220 |
2024-05-06 | 250 |
2024-05-13 | 280 |
2024-05-20 | 300 |
2024-05-27 | 320 |
2024-06-03 | 350 |
2024-06-10 | 380 |
2024-06-17 | 400 |
How do I get from this series and create features like 'year', 'month_of_year', 'week_of_year', 'day_of_year', 'month_of_quarter', 'week_of_quarter', 'day_of_quarter', 'week_of_month' for training and inference? Is there an easy way to do this on DARTS?
I'm talking here about date features, but the documentation also does not make it quite clear for me how DARTS handles ML forecasting in general. The template example is as follows (here using CatBoost):
target = series['p (mbar)'][:100]
# optionally, use past observed rainfall (pretending to be unknown beyond index 100)
past_cov = series['rain (mm)'][:100]
# optionally, use future temperatures (pretending this component is a forecast)
future_cov = series['T (degC)'][:106]
# predict 6 pressure values using the 12 past values of pressure and rainfall, as well as the 6 temperature
# values corresponding to the forecasted period
model = CatBoostModel(
lags=12,
lags_past_covariates=12,
lags_future_covariates=[0,1,2,3,4,5],
output_chunk_length=6
)
model.fit(target, past_covariates=past_cov, future_covariates=future_cov)
pred = model.predict(6)
What does it mean to use the 12 past values of pressure and rainfall? What about the 88 other data points? How does the library actually do the calculations for the data?
Thank you.
Also, I have questions of how to create a pipeline where I deseasonalize and detrend my data, make the desired forecasts, and add back these transformations to the forecasted data. Is it possible to do that in an easy manner?
Hi @guimalo,
When you assign a value to add_encoders
, the model will create the corresponding covariates "on the fly" during training/inference. In your case, since you are trying to encode information about the time axis, it can be considered as future covariates (we know in advance which day of the week/month of the year each timestamp will be at for an arbitrary number of steps). You can see them as "implicit" covariates, handled for you under the hood. If you prefer, you can of course create the encoders manually and explicitly set the covariates to the TimeSeries
returned:
from darts.dataprocessing.encoders.encoders import FutureCyclicEncoder
from darts.models import CatBoostModel
from darts.utils.timeseries_generation import sine_timeseries
from pandas import Timestamp
model = CatBoostModel(
lags=[-5, -3, -1],
output_chunk_length=2,
lags_future_covariates=[-2, 0, 2])
encoder = FutureCyclicEncoder(
attribute="month",
input_chunk_length = abs(min(model._get_lags("target"))),
output_chunk_length = model.output_chunk_length,
lags_covariates = model._get_lags("future"),
)
ts_target = tg.sine_timeseries(length=100, start=Timestamp("01-01-2000"))
axis_encoding = encoder.encode_train_inference(
n=5,
target=ts_target
)
model.fit(ts_target, future_covariates=axis_encoding)
model.predict(5)
You can create Pipeline
, and if your transforms are invertible, you can transform your forecast back to the original range : example for the documentation.