pomdp-baselines icon indicating copy to clipboard operation
pomdp-baselines copied to clipboard

Q Overestimation

Open smorad opened this issue 1 year ago • 1 comments

I'm rerunning velocity baselines in the POMDP directory and I'm observing exploding Q values fairly often. I was wondering if this is something you experienced during training. TD3 seems to avoid overestimation bias but the returns seem low. Any tips to get more stable returns across trials without massive batch sizes?

smorad avatar Apr 02 '23 23:04 smorad