rl-baselines3-zoo [DRAFT] C++ Export

Description

This is a draft, I suggest we keep the conversation in the associated issue: https://github.com/DLR-RM/stable-baselines3/issues/836

Motivation and Context

[X] I have raised an issue to propose this change (required for new features and bug fixes)

https://github.com/DLR-RM/stable-baselines3/issues/836

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[X] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist:

[ ] I've read the CONTRIBUTION guide (required)
[ ] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have reformatted the code using make format (required)
[ ] I have checked the codestyle using make check-codestyle and make lint (required)
[ ] I have ensured make pytest and make type both pass. (required)

Apr 01 '22 12:04 Gregwar

For some application, I would also be interrested in having the values predictions available, so I am adding them as well, some questions:

Are the Q/V networks decent estimations of the true functions or not ? (answer may depend on algorithm involved)
ContinuousCritic has several q_networks
- Is that only for the twin trick ?
- Is it ok in production to only look at the value of the first network, or should we compute both and get the min ?

Apr 06 '22 16:04 Gregwar

Also, is there a consistent pattern across all algorithms to access values functions just like predict for the action ?

Apr 06 '22 16:04 Gregwar

Another question/note:

If we use the normalize hyperparameter while training, I guess we will need to use the normalization during the inference, right? That would imply getting some data from the env wrapper and exporting it as well

Apr 06 '22 21:04 Gregwar

I guess we will need to use the normalization during the inference, right?

yes, we already save the mean and std for observation in a separate file (vecnormalize.pkl) but should be easy to export.

Apr 06 '22 21:04 araffin

onsistent pattern across all algorithms to access values functions just like predict for the action ?

not really... Only between algorithms of the same family...

For some application, I would also be interrested in having the values predictions available,

I agree but I would not focus on that for now (value functions are not needed for inference, unless it is DQN). I would do it in a follow up PR, once we have the basic feature ready and working reliably.

Apr 06 '22 21:04 araffin

re the Q/V networks decent estimations of the true functions or not ?

that's hard to give only one answer, we surely hope they are, but that doesn't mean it is always the case.

Is that only for the twin trick ?

yes

Is it ok in production to only look at the value of the first network

for rough estimation, only one is needed.

Apr 06 '22 21:04 araffin

@Gregwar I could successfully test it =) I had to do some tweaks to use it with conda env:

export CMAKE_PREFIX_PATH="${HOME}/.local/lib/libtorch:/home/user/miniconda3"

where the conda prefix can be retrieved with ${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

I also had to comment out the hardcoded set (Python_EXECUTABLE "/usr/bin/python3.8") (I don't think that's needed)

Apr 08 '22 16:04 araffin

@Gregwar I could successfully test it =) That is nice!

I had to do some tweaks to use it with conda env: Yes you are right this should not be there

I think I will have to setup some (automated) tests to check for consistency between Python and C++ predictions. It will be hard to be sure to cover all the cases else, I am digging in the code of all the possible algorithms and I might miss some information (there are many possible options as well like using images as input that should get normalized, using SDE, the Wrapping "normalize" that is not handled at all currently etc.)

Apr 11 '22 09:04 Gregwar

It will be hard to be sure to cover all the cases else,

Let's do a first working version that covers only some algorithms (let's say PPO, DQN and SAC) and only covers basic case (MLP, no images, no additional feature like normalization or SDE).

Once that's working and merged, we can work on adding additional features, I would start with normalization and then image support ;)

Apr 11 '22 10:04 araffin

Hello,

Sorry for the lag, we are currently working on our humanoid robots for RoboCup, we integrate DRL algorithms in the robots for the first year.

We spent some time investigating and finally using OpenVino runtime because of our robots architecture (we use ONNX as intermediary representation OpenVino's model exporter).

In first place I thought of implementing pre/post processing in C++ but it is actually a better idea to use it in the PyTorch module that is being traced or exported. We can't provide a lot of runtime-specific implementation, so we could focus on libtorch as first intended and provide ONNX possibility for people that want to use something else.

Jun 26 '22 15:06 Gregwar

@Gregwar could you give me access to your repo so I can push changes? (mainly merging master with this branch)

Sep 26 '22 13:09 araffin

@araffin Is there any update on this PR? I would be interested in exporting SB3 models into C++ executables but I am not sure on how to approach this problem

Jan 18 '24 16:01 zoythum

I would be interested in exporting SB3 models into C++ executables but I am not sure on how to approach this problem

For inference, you can have a look at https://stable-baselines3.readthedocs.io/en/master/guide/export.html and https://github.com/DLR-RM/stable-baselines3/issues/1349#issuecomment-1446161768

Jan 18 '24 17:01 araffin

rl-baselines3-zoo rl-baselines3-zoo copied to clipboard

[DRAFT] C++ Export

Description

Motivation and Context

Types of changes

Checklist:

rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard