D4RL
D4RL copied to clipboard
problem of qlearning_dataset
Hi,
thanks for the contribution.
I checked the qlearning_dataset, looks there are two small errors.
-
The next observations https://github.com/rail-berkeley/d4rl/blob/4aff6f8c46f62f9a57f79caa9287efefa45b6688/d4rl/init.py#L105 I think mostly in
qlearning_dataset
it drops the result since entryterminate_on_end
uses default set False So next_obs usedataset['observations'][i+1].astype(np.float32)
doesn't matters, but ifterminate_on_end
is set as true, or incaseterminals == True
, the next_obs of episode's last timestamp will use next trajectory's first obeservation. It's not so reasonable. Since now e.g. Mujoco's original dataset has 'next_observation' obviously. I recommend to change this line asdataset['next_observations'][i].astype(np.float32)
-
for loop problem https://github.com/rail-berkeley/d4rl/blob/4aff6f8c46f62f9a57f79caa9287efefa45b6688/d4rl/init.py#L103 It should not be
range(N-1)
, The last timestep's dataset would be always dropped. It should be changed to range(N) and adjust setting next_obs
Thanks for the code.