stable-baselines3
stable-baselines3 copied to clipboard
Training a model based on the prediction of another model (PPO)
❓ Question
Let's assume that I have already trained model A to predict A(x) given observation x. I would now like to train a model B using PPO to minimize A(x) + B(x). This means that, when stepping during training, I would need to take the action predicted by the model under training, i.e. B, plus the prediction from A (already trained model) for that same observation. Is there a proper way to do so using SB3?
Checklist
- [X] I have checked that there is no similar issue in the repo
- [X] I have read the documentation
- [X] If code there is, it is minimal and working
- [X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.
Is there a proper way to do so using SB3?
You should have a look at gym wrappers/VecEnv wrapper (we have tutorials/examples in our doc).