D4RL
D4RL copied to clipboard
[Question] Stochastic of environmental dynamics in Gym control tasks
Question
When I download the offline dataset and want to reproduce the trajectory with the same action sequences and initiate states, the subsequent state sequences (obs &reward) gradually offset the original offline trajectory over time.
https://github.com/Farama-Foundation/D4RL/assets/74049270/50de73ed-22eb-4534-9f56-16b46bb56d2e
The following code is used.
dataset = env.get_dataset()
states = dataset['state'][0:H] # we sample the first trajectory
actions = dataset['state'][0:H]
init_state = states[0]
env.set_state[init_state]
for t in range(H):
obs, reward, done,info = env.step(actions[t])
I hope for your reply!
I've also noticed this! Any update?