typewriter
typewriter copied to clipboard
Implementation detail in QRDQN
The estimation of quantile values must be increasing in theory. In practice, it should be ensured by loss function instead of sorting because the quantile regression for a particular transition uses same collection of targets with diffrent quantile parameter \tau.
In code, we should remove sort operation in https://github.com/NervanaSystems/coach/blob/fc5039854416064b5ef7938b707495d347776885/rl_coach/agents/qr_dqn_agent.py#L121