curiosity-driven-exploration-pytorch Are you actually using the learned intrinsic reward for the agent?

Are you actually using the learned intrinsic reward for the agent?

Open ferreirafabio opened this issue 4 years ago • 6 comments

trafficstars

Hi,

I can only see that you optimize the intrinsic loss in your code. Can you point me to the line where you add the intrinsic rewards to the actual environment/extrinsic rewards?

In some areas of your code I can see comments like # total reward = int reward which would, according to the original paper, be wrong, no?

Thank you.

Feb 20 '21 16:02 ferreirafabio

Also new to the repo, but here the loss is composed of both intrinsic and extrinsic reward: https://github.com/jcwleo/curiosity-driven-exploration-pytorch/blob/bacbefdfbdbc4c4382ab67147c9c8410305a4978/agents.py#L144

Mar 03 '21 18:03 ruoshiliu

Thanks @ruoshiliu. Yes, I saw the loss. But in addition to optimizing the loss you also need to use the intrinsic rewards (which is the result from optimizing its loss) for the agent as stated in the paper. Only optimizing the loss is not equivalent to using the intrinsic reward as an outcome of optimizing its loss.

Mar 03 '21 18:03 ferreirafabio

@ferreirafabio What do you mean by "use the intrinsic rewards"? Can you point out which section in the paper stated that?

Mar 04 '21 01:03 ruoshiliu

By that I mean reward = extrinsic reward + intrinsic reward. From the paper:

31B16992-8338-4C37-A1AE-6983E1EB9AF1

I now realize that the paper says the extrinsic reward can be optional. Wondering what is „usually“ used (with or without extrinsic reward) when peers use ICM as a baseline.

Mar 04 '21 06:03 ferreirafabio

Thank you for the clarification. Let me make sure I understand your question. What you are saying is the code (referenced above) tries to minimize the loss function by maximizing the extrinsic reward and minimizing the intrinsic reward. The correct implementation should reflect the equation (7) below in which $r_t=r^i_t+r^e_t$ In other words, the correct implementation should find the policy p that maximizes both intrinsic and extrinsic reward and parameters for inverse model and forward model that minimizes L_I and L_F.

Did I interpret your question correctly?

Mar 04 '21 22:03 ruoshiliu

Yes

Mar 04 '21 22:03 ferreirafabio

curiosity-driven-exploration-pytorch curiosity-driven-exploration-pytorch copied to clipboard

Are you actually using the learned intrinsic reward for the agent?

curiosity-driven-exploration-pytorch
curiosity-driven-exploration-pytorch copied to clipboard