recurrent-visual-attention icon indicating copy to clipboard operation
recurrent-visual-attention copied to clipboard

how negative numbers affect gradient descent.

Open yxiao54 opened this issue 4 years ago • 3 comments

The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.

I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?

yxiao54 avatar Mar 25 '20 00:03 yxiao54

This is a standard approach to use negative loss values in reinforcement learning to turn gradient descent into gradient ascent. Minimizing negative loss is maximizing the same loss without minus sign. To my knowledge there is no issues in pytorch with this.

In A3C algorithm (used in this project) the loss can increase during training. The reason is that reinforcement loss is measured as the advantage over baseline prediction. Baseline is network that is learned during training and in the start of the training, it's prediction are poor and it's very easy to have an advantage over it. At least this how I see what is going on here.

malashinroman avatar May 19 '20 13:05 malashinroman

@malashinroman Hi, May I ask why is this an A3C algorithm?

To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks!

litingfeng avatar Mar 17 '21 19:03 litingfeng

I think you're right. I thought about different environments, but there are no asynchronous agents

Roman среда, 17 марта 2021г., 22:50 +03:00 от litingfeng @.*** :

@.*** Hi, May I ask why is this an A3C algorithm?

To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

malashinroman avatar Mar 17 '21 21:03 malashinroman