[Feature Request] Using MPPIPlanner to Reproduce TD-MPC or Solve Simple Environments
Motivation
Hello,
Thanks for writing this great library and providing so many great open source tools!
I am looking into using the MPPIPlanner and it seems the implementation is motivated by TD-MPC. I am trying to understand how the MPPIPlanner could be used to reproduce TD-MPC as a starting point for working with the planner for other tasks. Looking at the TD-MPC implementation, it seems MPPIPlanner implements the planning step, but I'm unsure if it is an exact replication. It seems, for example, that the advantage module used for the MPPIPlanner is used to estimate value as in TD-MPC, but that is the only module I can see used in the planning step. In TD-MPC there are several MLP networks used for encoding the observations into latent space, modeling the dynamics, estimating the expected reward, and so on. Is the torchrl implementation dependent on the environment for having the encoder, dynamics, and rewards MLP networks?
I'm also curious how training would work in comparison to TD-MPC. Adding to the replay buffer seems straightforward after the planning step, but then updating based on samples from the replay buffer would be tricky as TD-MPC uses the encoder, dynamics, and reward MLPs for calculating the loss.
Solution
- A clarification of any misunderstandings above would be helpful.
- An example of how to use the MPPIPlanner as in TD-MPC would be helpful.
Alternatives
- A simple example of how to use the MPPIPlanner to control a pendulum or similar simple environment independent of TD-MPC would be useful.
Additional context
TD-MPC paper: https://arxiv.org/pdf/2203.04955.pdf TD-MPC repo: https://github.com/nicklashansen/tdmpc
Checklist
- [x] I have checked that there is no similar issue in the repo (required)
Hey thanks for this issue! We'll be working on a better integration of MPPI planner soon! I'll try to come up with a simple example script in the following weeks. Stay tuned!
Cool, I'm glad to hear that. Thanks so much!
Hi @vmoens, wondering if there's been any updates on this? I couldn't find any tutorial on the torchRL documentation but if I'm missing it -- please let me know.