ctc-executioner icon indicating copy to clipboard operation
ctc-executioner copied to clipboard

[RL] Improve reward function

Open mjuchli opened this issue 6 years ago • 0 comments

Instead of (p_0 - vwap_t) compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2 (normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.

evernote snapshot 20180310 225733

mjuchli avatar Mar 10 '18 21:03 mjuchli