stable-baselines3 Training a model based on the prediction of another model (PPO)

Training a model based on the prediction of another model (PPO)

Open SamySam0 opened this issue 1 year ago • 1 comments

❓ Question

Let's assume that I have already trained model A to predict A(x) given observation x. I would now like to train a model B using PPO to minimize A(x) + B(x). This means that, when stepping during training, I would need to take the action predicted by the model under training, i.e. B, plus the prediction from A (already trained model) for that same observation. Is there a proper way to do so using SB3?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Jan 31 '24 16:01 SamySam0

Is there a proper way to do so using SB3?

You should have a look at gym wrappers/VecEnv wrapper (we have tutorials/examples in our doc).

Feb 02 '24 09:02 araffin

stable-baselines3 stable-baselines3 copied to clipboard

Training a model based on the prediction of another model (PPO)

❓ Question

Checklist

stable-baselines3
stable-baselines3 copied to clipboard