D4RL [Question] Stochastic of environmental dynamics in Gym control tasks

[Question] Stochastic of environmental dynamics in Gym control tasks

Open return-sleep opened this issue 1 year ago • 1 comments

Question

When I download the offline dataset and want to reproduce the trajectory with the same action sequences and initiate states, the subsequent state sequences (obs &reward) gradually offset the original offline trajectory over time.

https://github.com/Farama-Foundation/D4RL/assets/74049270/50de73ed-22eb-4534-9f56-16b46bb56d2e

The following code is used.

dataset = env.get_dataset()
states = dataset['state'][0:H] # we sample the first trajectory
actions = dataset['state'][0:H]
init_state = states[0]
env.set_state[init_state]

for t in range(H):
      obs, reward, done,info = env.step(actions[t])

I hope for your reply!

Nov 28 '23 02:11 return-sleep

I've also noticed this! Any update?

Aug 29 '24 15:08 EdanToledo

D4RL D4RL copied to clipboard

[Question] Stochastic of environmental dynamics in Gym control tasks

Question

D4RL
D4RL copied to clipboard