D4RL icon indicating copy to clipboard operation
D4RL copied to clipboard

Question about Maximum Score and Expert Dataset

Open liziniu opened this issue 2 years ago • 1 comments

Hi,

I want to know how the maximum score is obtained for MuJoCo tasks? From the wiki (https://github.com/rail-berkeley/d4rl/wiki/Dataset-Reproducibility-Guide#gym-mujocogym-bullet), it seems that we use the stochastic SAC policy to obtain the expert dataset. But, in rlkit, we evaluate the performance of SAC by its deterministic policy. Typically, if we use the stochastic policy to evaluate, the performance is not very good. Thus, I am not sure whether the reported maximum score is based on the deterministic policy or the stochastic policy.

If the reported score is based on the deterministic policy, should we consider the deterministic policy to collect the expert dataset?

Highly appreciate it if anyone can help.

liziniu avatar Apr 26 '22 01:04 liziniu

Interested to hear from the team on this as well

AsadJeewa avatar Oct 09 '22 21:10 AsadJeewa