[REQUEST] Wrapping d3rlpy algorithms into SB3 training loop
Is your feature request related to a problem? Please describe. Not related to a problem, more of a feature request.
Describe the solution you'd like After training an agent using offline data, there is still a need to finetune that agent using online training. Unfortunately, d3rlpy does not support using parallel environments so I was thinking that we can finetune an agent trained on offline data in d3rlpy by somehow wrapping it into SB3 training loop and taking advantage of vectorized environments running in parallel.
Is there any easy way one can achieve that or will that require considerable development?
If I remember correctly, in previous versions, there was a wrapper that allowed d3rlpy-SB3 conversion, but for some reason, it was not maintained anymore.
It would be interesting to have it to combine both libraries (for example, pre-train a SAC with d3rlpy offline and then deploy and tune it online with SB3).
EDIT: In fact, it would be nice to have some guidance about how to transfer weights between SB3 and d3rlpy models ;-)