D4RL
D4RL copied to clipboard
Recovering trajectories for the Androit environments
Hi,
Thanks for releasing these environments and this data!
I'm having some issues recovering the original trajectories from Androit environments in a way that seems to make sense, mainly because it seems like the 'timeout' and 'terminal' flags are misplaced (although I might be mistaken).
For example, in the 'door' environment when I use the built in function here to iterate through the trajectories, the returned trajectories seem to alternate between length 200 trajectories that fail to complete the task and length 100 trajectories that are successful. Then, when I run the environment it seems to terminate after 200 timesteps. Perhaps I'm missing something but this all seems strange.
The pen environment seems to make even less sense. The trajectories are all the same length but the high reward states are at first at random places in the trajectory (ie a handful of states in the middle), and then increasingly as you iterate through the dataset they are more and more present. For many of the trajectories after the first handful, every timestep has high enough reward to indicate membership in the goal set.
Any guidance would be much appreciated! Thanks!
The issue I had with door seemed to be resolved by switching from 'door-human-v0' to 'door-human-v1', but that did not fix pen.