dopamine distribution in softmax_cross_entropy_with

The following code uses target distribution in the softmax_cross_entropy_with_logits. Is this correct?

https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L259

BTW, I found the code for the projection in c51 is somewhat complicated. I use the following code to compute the projection in Eq.7 in the paper

y = tf.clip_by_value(supports, v_min, v_max)[:, None, :]                     # [B, 1, N]
target_support = target_support[None, :, None]                                # [1, N, 1]

y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
y = tf.reduce_sum(y * weights, axis=2)                                               # [B, N]
y = tf.stop_gradient(y)

All names except y stand for the same meaning as those defined in function project_distribution. Do you think they are doing the same thing?

Oct 17 '19 08:10 xlnwel

hi, regarding your first question, is there something that makes you believe it's not correct?

regarding your second question, one way you can verify this is by running the unit test.

Oct 17 '19 21:10 psc-g

Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here with the following code.

  with tf.control_dependencies(validate_deps):
    # Ex: `v_min, v_max = 4, 8`.
    v_min, v_max = target_support[0], target_support[-1]
    y = tf.clip_by_value(supports, v_min, v_max)[:, None, :]                     # [B, 1, N]
    target_support = target_support[None, :, None]                                # [1, N, 1]

    y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
    y = tf.reduce_sum(y * weights[:, None, :], axis=2)                                               # [B, N]
    projection = tf.stop_gradient(y)
    return projection

The test seems suggesting it's okay. Does it mean I'm correct in the second one? Last, thanks for your help!

Oct 18 '19 00:10 xlnwel

yup, seems like the second one is a clean refactoring! :)

On Thu, Oct 17, 2019 at 8:45 PM The Raven Chaser [email protected] wrote:

Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L402 with the following code.

with tf.control_dependencies(validate_deps): # Ex: v_min, v_max = 4, 8. v_min, v_max = target_support[0], target_support[-1] y = tf.clip_by_value(supports, v_min, v_max)[:, None, :] # [B, 1, N] target_support = target_support[None, :, None] # [1, N, 1]
y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1)     # [B, N, N]
y = tf.reduce_sum(y * weights[:, None, :], axis=2)                                               # [B, N]
projection = tf.stop_gradient(y)
The test seems suggesting it's okay. Does it mean I'm correct in the second one? [image: image] https://user-images.githubusercontent.com/10184153/67057518-63ddc980-f183-11e9-95d5-d8613024ee43.png Last, thanks for your help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMM2NORWK2PR4RY5WDTQPEBKVA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSBEXI#issuecomment-543429213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMNY5QB4FYRUHK7HUFDQPEBKVANCNFSM4JBWBTUA .

Oct 18 '19 09:10 psc-g

May I pull this solution to simplify the code?

Oct 18 '19 11:10 xlnwel

if you don't mind, let's hold off on this for a bit. we're still sorting out some issues with external pull requests. but thank you!

On Fri, Oct 18, 2019 at 7:07 AM The Raven Chaser [email protected] wrote:

May I pull this solution to simplify the code?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMOJWR6OZFEEAQK3MN3QPGKIXA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBT5ZTI#issuecomment-543677645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMLA3TUQ6ZWBV4A7DYTQPGKIXANCNFSM4JBWBTUA .

Oct 18 '19 11:10 psc-g

Sure, no problem. But if you find this implementation is not right, please let me know.

By the way, thanks for your open-source code. That's awesome.

Oct 18 '19 11:10 xlnwel

distribution in softmax_cross_entropy_with_logits