[bug?] sim data collection combines actions with `next_observation` instead of observation on which the action is based

Open tlpss opened this issue 11 months ago • 0 comments

In the data collection script for sim envs, the action (a_t) is determined based on the current observation (o_t). However, when the step method is called, the observation is overwritten with the new observation o_{t+1}, and this pair (a_t, o_{t+1}} is recorded as a demonstration step. I believe this is a mistake. A simplified and corrected data collection loop is given below:

 for _ in range(n_episodes):
        obs, info = env.reset()
        done = False
        dataset_recorder.start_episode()
        while not done:
            action = agent_callable(env)
            new_obs, reward, termination, truncation, info = env.step(action)
            done = termination or truncation
            dataset_recorder.record(obs, action, reward, done, info)
            obs = new_obs
        dataset_recorder.save_episode()

Note that I also explicitly call the reset, to avoid storing the last observation with an action that is never executed (the autoreset ignores the action if step is called on an environment that needs to reset).

I have not run the script, but was merely looking for code that allowed me to collect demonstrations for my own gym Env and store them in the Lerobot dataset format.

Jan 14 '25 16:01 tlpss