typewriter Implementation detail in QRDQN

Implementation detail in QRDQN

Open Officium opened this issue 4 years ago • 0 comments

The estimation of quantile values must be increasing in theory. In practice, it should be ensured by loss function instead of sorting because the quantile regression for a particular transition uses same collection of targets with diffrent quantile parameter \tau.

In code, we should remove sort operation in https://github.com/NervanaSystems/coach/blob/fc5039854416064b5ef7938b707495d347776885/rl_coach/agents/qr_dqn_agent.py#L121

Sep 04 '19 15:09 Officium

typewriter typewriter copied to clipboard

Implementation detail in QRDQN

typewriter
typewriter copied to clipboard