PRUDEX-Compass Logging losses and reward with Tensorboard

Logging losses and reward with Tensorboard

Open DeepAnonymous opened this issue 2 years ago • 2 comments

Hi,

I tried to log policy and critic losses as well as reward using Tensorboard. I run training using default setting with sz50.

I noticed that critic losses keep increasing. Does this even make sense?

tensorboard

I wonder is there any issue with the code regarding critic losses, could you please have a check/comment on this.

Thank you.

Dec 22 '22 06:12 DeepAnonymous

Also, could you please give a short comment about feedback_type, I am confused with 0,1,2 options.

For me, only feedback_type = 0 makes sense, because it takes ensemble-wise index std_Q_critic_list[en_index], instead of std_Q_critic_list[0] as feedback_type = 1

Dec 22 '22 07:12 DeepAnonymous

In RL, I don't think we should pay too much attention in loss of critic actor or anything else. The key indicator is the reward sum. For the deedback_type, I think that is from the ablation study.

May 17 '23 06:05 qinmoelei

PRUDEX-Compass PRUDEX-Compass copied to clipboard

Logging losses and reward with Tensorboard

PRUDEX-Compass
PRUDEX-Compass copied to clipboard