D4RL problem of qlearning

problem of qlearning_dataset

Open im-Kitsch opened this issue 3 years ago • 0 comments

Hi,

thanks for the contribution.

I checked the qlearning_dataset, looks there are two small errors.

The next observations https://github.com/rail-berkeley/d4rl/blob/4aff6f8c46f62f9a57f79caa9287efefa45b6688/d4rl/init.py#L105 I think mostly in qlearning_dataset it drops the result since entry terminate_on_end uses default set False So next_obs use dataset['observations'][i+1].astype(np.float32) doesn't matters, but if terminate_on_end is set as true, or incase terminals == True , the next_obs of episode's last timestamp will use next trajectory's first obeservation. It's not so reasonable. Since now e.g. Mujoco's original dataset has 'next_observation' obviously. I recommend to change this line as dataset['next_observations'][i].astype(np.float32)
for loop problem https://github.com/rail-berkeley/d4rl/blob/4aff6f8c46f62f9a57f79caa9287efefa45b6688/d4rl/init.py#L103 It should not be range(N-1), The last timestep's dataset would be always dropped. It should be changed to range(N) and adjust setting next_obs

Thanks for the code.

Feb 21 '22 12:02 im-Kitsch