dopamine icon indicating copy to clipboard operation
dopamine copied to clipboard

a bug-fix of QR-DQN network definition.

Open ddlau opened this issue 4 years ago • 6 comments

ddlau avatar Nov 24 '20 10:11 ddlau

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Nov 24 '20 10:11 google-cla[bot]

@googlebot I signed it!

ddlau avatar Nov 24 '20 10:11 ddlau

This is a Categorical-DQN legacy, which was just "transposed" in QR-DQN. In contrast to Categorical-DQN, QR-DQN set fixed probabilites and learnable locations (a.k.a diracs).

ddlau avatar Nov 24 '20 10:11 ddlau

Thanks for pointing this out! This is indeed semantically incorrect and is mostly a consequence of QR-DQN inheriting from Rainbow (which does expect probabilities). The probabilities field you rightly point out as not being probabilities is in fact not used at all in QR-DQN.

psc-g avatar Nov 24 '20 17:11 psc-g

rightly

Yeah, I noticed it too, after "uptraced" to its caller, and found the strange thing is not used. To be a little strict, isn't this sort of impreciseness and a misleading to new comings, more or less, I guess? After all, it is not proper, so I suggest the modification, at least some comment there.

ddlau avatar Nov 25 '20 02:11 ddlau

yup, we'll be adding some clarifications there. thanks for pointing this out!

On Tue, Nov 24, 2020 at 9:43 PM ddlau [email protected] wrote:

rightly

Yeah, I noticed it too, after "uptraced" to its caller, and found the strange thing is not used. To be a little strict, isn't this sort of impreciseness and a misleading to new comings, more or less, I guess? After all, it is not proper, so I suggest the modification, at least some comment there.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/pull/157#issuecomment-733426860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMN3PUXFERWCQHZ6UKLSRRVN7ANCNFSM4UAWCLAA .

psc-g avatar Nov 25 '20 02:11 psc-g