softlearning icon indicating copy to clipboard operation
softlearning copied to clipboard

unstable training curve for default SQL

Open LuhuanWu opened this issue 6 years ago • 1 comments

Hi,

First of all, thanks for the brilliant papers and making the codes open-source.

I was running SQL on half-cheetah with default setting using the command: --universe=gym --domain=HalfCheetah --task=v2 --algorithm=SQL --exp-name=my-sql-experiment-2 --checkpoint-frequency=1000.

It uses Gaussian policy and a reward scale of 30, which I think implies a very low entropy regularization.

However, I obtained very unstable training return curve and evaluation return curve as below:

image image

I was wondering if there is anything wrong with the default SQL setting and how do you test the SQL? I tried to lower the reward scale, and it is leading to a lower but a little bit stabler return curve.

Thanks!

LuhuanWu avatar Apr 10 '19 14:04 LuhuanWu

Hi,

your reward scale might indeed be too large and that's why the final performance is quite poor. By instability you mean the spikes in the blue curve? It looks quite typical to me if the curve shows the individual evaluation episodes. The curves we have in the paper show the running average over 10 or so evaluation episodes.

Cheers

haarnoja avatar Apr 14 '19 11:04 haarnoja