rl-baselines3-zoo
rl-baselines3-zoo copied to clipboard
[DRAFT] C++ Export
Description
This is a draft, I suggest we keep the conversation in the associated issue: https://github.com/DLR-RM/stable-baselines3/issues/836
Motivation and Context
- [X] I have raised an issue to propose this change (required for new features and bug fixes)
https://github.com/DLR-RM/stable-baselines3/issues/836
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation (update in the documentation)
Checklist:
- [ ] I've read the CONTRIBUTION guide (required)
- [ ] I have updated the changelog accordingly (required).
- [ ] My change requires a change to the documentation.
- [ ] I have updated the tests accordingly (required for a bug fix or a new feature).
- [ ] I have updated the documentation accordingly.
- [ ] I have reformatted the code using
make format(required) - [ ] I have checked the codestyle using
make check-codestyleandmake lint(required) - [ ] I have ensured
make pytestandmake typeboth pass. (required)
For some application, I would also be interrested in having the values predictions available, so I am adding them as well, some questions:
- Are the Q/V networks decent estimations of the true functions or not ? (answer may depend on algorithm involved)
ContinuousCritichas severalq_networks- Is that only for the twin trick ?
- Is it ok in production to only look at the value of the first network, or should we compute both and get the min ?
Also, is there a consistent pattern across all algorithms to access values functions just like predict for the action ?
Another question/note:
If we use the normalize hyperparameter while training, I guess we will need to use the normalization during the inference, right? That would imply getting some data from the env wrapper and exporting it as well
I guess we will need to use the normalization during the inference, right?
yes, we already save the mean and std for observation in a separate file (vecnormalize.pkl) but should be easy to export.
onsistent pattern across all algorithms to access values functions just like predict for the action ?
not really... Only between algorithms of the same family...
For some application, I would also be interrested in having the values predictions available,
I agree but I would not focus on that for now (value functions are not needed for inference, unless it is DQN). I would do it in a follow up PR, once we have the basic feature ready and working reliably.
re the Q/V networks decent estimations of the true functions or not ?
that's hard to give only one answer, we surely hope they are, but that doesn't mean it is always the case.
Is that only for the twin trick ?
yes
Is it ok in production to only look at the value of the first network
for rough estimation, only one is needed.
@Gregwar I could successfully test it =) I had to do some tweaks to use it with conda env:
export CMAKE_PREFIX_PATH="${HOME}/.local/lib/libtorch:/home/user/miniconda3"
where the conda prefix can be retrieved with ${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
I also had to comment out the hardcoded set (Python_EXECUTABLE "/usr/bin/python3.8") (I don't think that's needed)
@Gregwar I could successfully test it =) That is nice!
I had to do some tweaks to use it with conda env: Yes you are right this should not be there
I think I will have to setup some (automated) tests to check for consistency between Python and C++ predictions. It will be hard to be sure to cover all the cases else, I am digging in the code of all the possible algorithms and I might miss some information (there are many possible options as well like using images as input that should get normalized, using SDE, the Wrapping "normalize" that is not handled at all currently etc.)
It will be hard to be sure to cover all the cases else,
Let's do a first working version that covers only some algorithms (let's say PPO, DQN and SAC) and only covers basic case (MLP, no images, no additional feature like normalization or SDE).
Once that's working and merged, we can work on adding additional features, I would start with normalization and then image support ;)
Hello,
Sorry for the lag, we are currently working on our humanoid robots for RoboCup, we integrate DRL algorithms in the robots for the first year.
We spent some time investigating and finally using OpenVino runtime because of our robots architecture (we use ONNX as intermediary representation OpenVino's model exporter).
In first place I thought of implementing pre/post processing in C++ but it is actually a better idea to use it in the PyTorch module that is being traced or exported. We can't provide a lot of runtime-specific implementation, so we could focus on libtorch as first intended and provide ONNX possibility for people that want to use something else.
@Gregwar could you give me access to your repo so I can push changes? (mainly merging master with this branch)
@araffin Is there any update on this PR? I would be interested in exporting SB3 models into C++ executables but I am not sure on how to approach this problem
I would be interested in exporting SB3 models into C++ executables but I am not sure on how to approach this problem
For inference, you can have a look at https://stable-baselines3.readthedocs.io/en/master/guide/export.html and https://github.com/DLR-RM/stable-baselines3/issues/1349#issuecomment-1446161768