dreamerv2 icon indicating copy to clipboard operation
dreamerv2 copied to clipboard

Questions on Imagination MDP and imagination horizon H = 15

Open GoingMyWay opened this issue 1 year ago • 0 comments

Dear author,

After reading the code and the paper, I am confused about why Imagination MDP is introduced and why imagination horizon is needed. For example, with a trained world model and given a trajectory: $\tau$, we can sample an initial state and simulate a trajectory with the world model. In DreamerV2, each state in the sampled trajectory is used to simulate a sub-trajectory whose length is 15 and then used to update the policy. Why is your solution feasible for training model-based RL? It looks like magic. Could you help me to understand it?

GoingMyWay avatar Aug 12 '22 11:08 GoingMyWay