Cal-QL icon indicating copy to clipboard operation
Cal-QL copied to clipboard

Online value function divergence for cql

Open zhonghai1995 opened this issue 6 months ago • 1 comments

Hi, thanks for your work!

When I try cql in the pen binary environment, I find that for cql's value function always tend to diverge (tried mixing ratio 0.0 and 0.5, both for 5 random seeds). The critics give very large estimates, causing it could not make progress during online finetuning. any ideas or suggestions on how to fix this overestimation issue? I see double critic is already being used. Thanks so much!

Best, Hai

zhonghai1995 avatar Aug 26 '24 15:08 zhonghai1995