ctc-executioner
ctc-executioner copied to clipboard
[RL] Improve reward function
Instead of (p_0 - vwap_t)
compare against p_0 - (max([p_0; p_t]) + min([p_0; p_t])) / 2
(normalized between -1 and 1). Therefore we have a stable reward for any kind of fluctuation.