stable-baselines3 icon indicating copy to clipboard operation
stable-baselines3 copied to clipboard

Training a model based on the prediction of another model (PPO)

Open SamySam0 opened this issue 1 year ago • 1 comments

❓ Question

Let's assume that I have already trained model A to predict A(x) given observation x. I would now like to train a model B using PPO to minimize A(x) + B(x). This means that, when stepping during training, I would need to take the action predicted by the model under training, i.e. B, plus the prediction from A (already trained model) for that same observation. Is there a proper way to do so using SB3?

Checklist

SamySam0 avatar Jan 31 '24 16:01 SamySam0

Is there a proper way to do so using SB3?

You should have a look at gym wrappers/VecEnv wrapper (we have tutorials/examples in our doc).

araffin avatar Feb 02 '24 09:02 araffin