FinRL-Meta
FinRL-Meta copied to clipboard
[Suggestion] Normalization.
Adding normalization to the data preprocessor might be a great feature:
-
Min-Max Normalization,
-
Decimal Scaling Normalization,
-
Z-Score Normalization,
-
Median Normalization,
-
Sigmoid Normalization,
-
Tanh estimators
-
Bhanja, Samit & Das, Abhishek. (2018). Impact of Data Normalization on Deep Neural Network for Time Series Forecasting. ResearchGate
These are more advanced / adaptive approaches:
- Passalis, Nikolaos, u. a. Deep Adaptive Input Normalization for Time Series Forecasting. 2019. Github Repo arXiv:1902.07892
- Nalmpantis, Angelos, u. a. „Deep Adaptive Group-Based Input Normalization for Financial Trading“. Pattern Recognition Letters, Bd. 152, Dezember 2021, S. 413–19. DOI.org (Crossref), https://doi.org/10.1016/j.patrec.2021.11.004
- Tran, Dat Thanh, u. a. Bilinear Input Normalization for Neural Networks in Financial Forecasting. 2021. arXiv:2109.00983
Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already: https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94
According to the paper from ResearchGate the Tanh estimator is most promising.
Good explanation regarding the lookahead problem suggesting an expanding or rolling window: https://stats.stackexchange.com/questions/442739/look-ahead-bias-induced-by-standardization-of-a-time-series/462976#462976
Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already:
https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94
According to the paper from ResearchGate the Tanh estimator is most promising.
Yes. It will greatly influence the normalization. What about add a column 'if_leaking' to denote the data. In normalization process, we will ignore the rows 'if_leaking' == true. Do you have any idea to solve it?