[bug?] sim data collection combines actions with `next_observation` instead of observation on which the action is based
In the data collection script for sim envs, the action (a_t) is determined based on the current observation (o_t).
However, when the step method is called, the observation is overwritten with the new observation o_{t+1}, and this pair (a_t, o_{t+1}} is recorded as a demonstration step. I believe this is a mistake. A simplified and corrected data collection loop is given below:
for _ in range(n_episodes):
obs, info = env.reset()
done = False
dataset_recorder.start_episode()
while not done:
action = agent_callable(env)
new_obs, reward, termination, truncation, info = env.step(action)
done = termination or truncation
dataset_recorder.record(obs, action, reward, done, info)
obs = new_obs
dataset_recorder.save_episode()
Note that I also explicitly call the reset, to avoid storing the last observation with an action that is never executed (the autoreset ignores the action if step is called on an environment that needs to reset).
I have not run the script, but was merely looking for code that allowed me to collect demonstrations for my own gym Env and store them in the Lerobot dataset format.