dopamine icon indicating copy to clipboard operation
dopamine copied to clipboard

distribution in softmax_cross_entropy_with_logits

Open xlnwel opened this issue 6 years ago • 6 comments

The following code uses target distribution in the softmax_cross_entropy_with_logits. Is this correct?

https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L259

BTW, I found the code for the projection in c51 is somewhat complicated. I use the following code to compute the projection in Eq.7 in the paper

y = tf.clip_by_value(supports, v_min, v_max)[:, None, :]                     # [B, 1, N]
target_support = target_support[None, :, None]                                # [1, N, 1]

y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
y = tf.reduce_sum(y * weights, axis=2)                                               # [B, N]
y = tf.stop_gradient(y)

All names except y stand for the same meaning as those defined in function project_distribution. Do you think they are doing the same thing?

xlnwel avatar Oct 17 '19 08:10 xlnwel

hi, regarding your first question, is there something that makes you believe it's not correct?

regarding your second question, one way you can verify this is by running the unit test.

psc-g avatar Oct 17 '19 21:10 psc-g

Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here with the following code.

  with tf.control_dependencies(validate_deps):
    # Ex: `v_min, v_max = 4, 8`.
    v_min, v_max = target_support[0], target_support[-1]
    y = tf.clip_by_value(supports, v_min, v_max)[:, None, :]                     # [B, 1, N]
    target_support = target_support[None, :, None]                                # [1, N, 1]

    y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
    y = tf.reduce_sum(y * weights[:, None, :], axis=2)                                               # [B, N]
    projection = tf.stop_gradient(y)
    return projection

The test seems suggesting it's okay. Does it mean I'm correct in the second one? image Last, thanks for your help!

xlnwel avatar Oct 18 '19 00:10 xlnwel

yup, seems like the second one is a clean refactoring! :)

On Thu, Oct 17, 2019 at 8:45 PM The Raven Chaser [email protected] wrote:

Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L402 with the following code.

with tf.control_dependencies(validate_deps): # Ex: v_min, v_max = 4, 8. v_min, v_max = target_support[0], target_support[-1] y = tf.clip_by_value(supports, v_min, v_max)[:, None, :] # [B, 1, N] target_support = target_support[None, :, None] # [1, N, 1]

y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
y = tf.reduce_sum(y * weights[:, None, :], axis=2)                                               # [B, N]
projection = tf.stop_gradient(y)

The test seems suggesting it's okay. Does it mean I'm correct in the second one? [image: image] https://user-images.githubusercontent.com/10184153/67057518-63ddc980-f183-11e9-95d5-d8613024ee43.png Last, thanks for your help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMM2NORWK2PR4RY5WDTQPEBKVA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSBEXI#issuecomment-543429213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMNY5QB4FYRUHK7HUFDQPEBKVANCNFSM4JBWBTUA .

psc-g avatar Oct 18 '19 09:10 psc-g

May I pull this solution to simplify the code?

xlnwel avatar Oct 18 '19 11:10 xlnwel

if you don't mind, let's hold off on this for a bit. we're still sorting out some issues with external pull requests. but thank you!

On Fri, Oct 18, 2019 at 7:07 AM The Raven Chaser [email protected] wrote:

May I pull this solution to simplify the code?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMOJWR6OZFEEAQK3MN3QPGKIXA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBT5ZTI#issuecomment-543677645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMLA3TUQ6ZWBV4A7DYTQPGKIXANCNFSM4JBWBTUA .

psc-g avatar Oct 18 '19 11:10 psc-g

Sure, no problem. But if you find this implementation is not right, please let me know.

By the way, thanks for your open-source code. That's awesome.

xlnwel avatar Oct 18 '19 11:10 xlnwel