Improved facilities for handling categorical features + MultiEmbedding for PyTorch models
Is your feature request related to a current problem? Please describe.
It's not clear, from the Darts API and documentation, what options I have for handling categorical features (covariates) in my data. While I'm aware that I can apply transformations like one-hot encoding or ordinal encoding using the sklearn.preprocessing API from scikit-learn before passing my data to a TimeSeries, it would be nice to have this handled by Darts since it's such a high-level API. The PyTorch-Forecasting package, in contrast, directly supports category (string) columns and handles encoding for the user (although it only supports PyTorch DNN models).
Relatedly, PyTorch-Forecasting's TemporalFusionTransformer model includes a MultiEmbedding module that embeds ordinal-encoded categorical features into a (float) vector space, with a hyperparameter embedding_sizes. This is an important part of the TFT model that appears to be missing from the Darts version. I see that there's a private-looking _MultiEmbedding class here: https://github.com/unit8co/darts/blob/2b071e655c8516f98e8a787c9b843bad38aa0a58/darts/models/forecasting/tft_submodels.py#L59 but it doesn't appear to be used in the actual TFTModel. This embedding includes learnable weights, so it can't be done as a preprocessing step, it has to be part of the network.
Describe proposed solution A clear and concise description of what the library should provide to solve missing functionality.
- Add explicit support, documentation and examples for categorical covariates and encoding schemes.
- Add the
_MultiEmbeddingmodule and appropriate parameters to the PyTorch models.
Hi @alexkyllo and thanks for the suggestion. We definitely want to improve our treatment of categorical features; in time series but also for including static covariates. I also agree we could afford some specific documentation around that (although I'd wait that we have better functionalities around that). If you feel like it, we are definitely welcoming contributions! Tackling the categorical features for TFT seems like a good start.
Just wanted to "second" this suggestion. Since the TimeSeries object was abstracting away so many other things nicely, I was hoping that the categorical variables would be managed similarly. It would be nice to not have to manage encoding on the user side. I was using the RegressionModel and ran into this.
https://github.com/unit8co/darts/issues/597 came up here again
您的功能请求是否与当前问题相关?请说明。
从 Darts API 和文档中,我不清楚我有什么选项来处理数据中的分类特征(协变量)。虽然我知道我可以在将数据传递到之前使用scikit-learn的API应用诸如单热编码或序数编码之类的转换,但由Darts处理这将是很好的,因为它是一个如此高级的API。相比之下,PyTorch-Forecasting包直接支持类别(字符串)列并为用户处理编码(尽管它仅支持PyTorch DNN模型)。
sklearn.preprocessing``TimeSeries与此相关的是,PyTorch-Forecasting的模型包括一个模块,该模块将序数编码的分类特征嵌入到(浮点)向量空间中,并带有超参数。这是TFT模型的重要组成部分,似乎在Darts版本中缺失。我看到这里有一个看起来很私人的课程:
TemporalFusionTransformer``MultiEmbedding``embedding_sizes``_MultiEmbeddinghttps://github.com/unit8co/darts/blob/2b071e655c8516f98e8a787c9b843bad38aa0a58/darts/models/forecasting/tft_submodels.py#L59
但它似乎没有在实际的中使用。这种嵌入包括可学习的权重,因此它不能作为预处理步骤完成,它必须是网络的一部分。
TFTModel描述建议的解决方案 清晰简洁地描述库应提供哪些内容以解决缺少的功能。
- 为分类协变量和编码方案添加显式支持、文档和示例。
- 将模块和适当的参数添加到 PyTorch 模型中。
_MultiEmbedding
您好,我想问现在darts现在支持dataframe中包含字符串吗?我现在dataframe中包含字符串出现了不允许的错误。 代码: df = pd.read_csv("all_data.csv") df.sort_index(inplace=True) print(df.head(2)) series = TimeSeries.from_dataframe(df) 报错:ValueError: could not convert string to float: '0.035-1'
对于特征为静态特征的字符串类型的数据,我应该如何变为series类型数据?Thanks.
There's now support for static covariates in TimeSeries, with a StaticCovariatesTransformer that can both scale static covariates or encode categorial features.
See here for an example: https://unit8co.github.io/darts/examples/15-static-covariates.html
Hi @hrzn, am I right that there is no support for categorical dynamic features (categorical future covariates/categorical past covariates) as of right now? In case of TFT this would be interesting as the categorical embedding for those past and future inputs can then be part of the network. I could not find this functionality yet. Or is it implemented somehow already? Thank you!
@floriangue you are correct dynamical categorical variables are not supported yet; this is now tracked here: https://github.com/unit8co/darts/issues/1514