Trajectory Replay in DMControl environment doesn't match dataset
Hi,
I've been trying to replay the trajectories in the TDMPC2 dataset for the Cartpole Swingup DM Control Suite task, and I find that when I reset the environment to the initial observation given in the trajectory, and step through the corresponding actions, I don't reach the same end state as that of the trajectory in the dataset. To reset the environment to a certain physics state for the cartpole swingup task, I can recover the cart position/velocity as well as pole angle/velocity from the observations at each timestep.
I find that the trajectory I get from my instantiation of the environment diverges from the trajectory given in the dataset after ~200 timesteps. I'm using the same seed, action repeat, and conda environment given in the repo.
Similarly, I used a checkpoint of the TDMPC2 agent on the Cartpole Swingup task and recorded its observations/actions in float64 (instead of the default float32) and found that I was able to match the trajectory in my DMControl environment until ~300 timesteps. However, after this I still reach a different end state in my environment than the one given in the dataset trajectory
I'm currently training my own model using the TDMPC2 dataset, and because I'm unable to reproduce the trajectories in my instantiation of the DM Control Suite environment, I can't debug my training runs of my model on this data.
I'm wondering if you faced this issue when using replayed trajectories in training your model, and anything you did to mitigate this issue. Any help would be much appreciated!
Thank you!