ctc-executioner [RL] Improve reward function

[RL] Improve reward function

Open mjuchli opened this issue 6 years ago • 0 comments

Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.

evernote snapshot 20180310 225733

Mar 10 '18 21:03 mjuchli

ctc-executioner ctc-executioner copied to clipboard

[RL] Improve reward function

ctc-executioner
ctc-executioner copied to clipboard