FinRL-Meta icon indicating copy to clipboard operation
FinRL-Meta copied to clipboard

[Suggestion] Normalization.

Open cryptocoinserver opened this issue 3 years ago • 3 comments

Adding normalization to the data preprocessor might be a great feature:

  • Min-Max Normalization,

  • Decimal Scaling Normalization,

  • Z-Score Normalization,

  • Median Normalization,

  • Sigmoid Normalization,

  • Tanh estimators

  • Bhanja, Samit & Das, Abhishek. (2018). Impact of Data Normalization on Deep Neural Network for Time Series Forecasting. ResearchGate

These are more advanced / adaptive approaches:

  • Passalis, Nikolaos, u. a. Deep Adaptive Input Normalization for Time Series Forecasting. 2019. Github Repo arXiv:1902.07892
  • Nalmpantis, Angelos, u. a. „Deep Adaptive Group-Based Input Normalization for Financial Trading“. Pattern Recognition Letters, Bd. 152, Dezember 2021, S. 413–19. DOI.org (Crossref), https://doi.org/10.1016/j.patrec.2021.11.004
  • Tran, Dat Thanh, u. a. Bilinear Input Normalization for Neural Networks in Financial Forecasting. 2021. arXiv:2109.00983

cryptocoinserver avatar Feb 08 '22 09:02 cryptocoinserver

Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already: https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94

According to the paper from ResearchGate the Tanh estimator is most promising.

cryptocoinserver avatar Feb 13 '22 10:02 cryptocoinserver

Good explanation regarding the lookahead problem suggesting an expanding or rolling window: https://stats.stackexchange.com/questions/442739/look-ahead-bias-induced-by-standardization-of-a-time-series/462976#462976

cryptocoinserver avatar Mar 15 '22 13:03 cryptocoinserver

Just a note: There is a danger of lookahead/data leaking when implementing normalization using the whole dataset. Therefore it needs to be carefully done inside the environment at each step (with a certain lookback). I saw some environments use normalization already:

https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/203bb7d3f890220bb3e82bc5e34b65051a0b61dc/finrl_meta/env_crypto_trading/env_multiple_crypto.py#L94

According to the paper from ResearchGate the Tanh estimator is most promising.

Yes. It will greatly influence the normalization. What about add a column 'if_leaking' to denote the data. In normalization process, we will ignore the rows 'if_leaking' == true. Do you have any idea to solve it?

zhumingpassional avatar Apr 01 '22 02:04 zhumingpassional