stable-baselines3
stable-baselines3 copied to clipboard
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
## Changes #### OffPolicyAlgorithm - Now accepts buffer class as argument type - Rationale: allows for other types of replay buffers besides the current implementations - Now accepts arguments for...
### π Bug In the rollout collection of PPO, the callback references of local variable `values` is not updated after the value for the last timestep has been predicted. https://github.com/DLR-RM/stable-baselines3/blob/ed308a71be24036744b5ad4af61b083e4fbdf83c/stable_baselines3/common/on_policy_algorithm.py#L210...
### π Bug I am having issues in SB3 with a CustomFeatureExtractor for a Dict observation space that is making my GPU memory explode. The observation space is composed of...
β¦the user is warned rather than silently ignored. ## Description Right now logged values can be silently ignored or logged as two different types (unlikely). ## Motivation and Context I...
### Question Hello, I am using SB3 to train some model where I want the inference to run on embedded robots using C++. I had a look at PyTorch documentation...
### π Bug When you try to extend the VecEnvWrapper, and put a breakpoint in the construction function of the new class, pycharm crashes due to some recursive call in...
### π Documentation I have spent an entire afternoon staring into the dark voids of SB3's verbosity code (long story). Currently, the verbosity modes and what precisely they do is...
### π Bug When cloudpickle fails to deserialize an object, [`json_to_data`](https://github.com/DLR-RM/stable-baselines3/blob/54bcfa4544315fc920be0944fc380fd75e2f7c4a/stable_baselines3/common/save_util.py#L130) prints a warning (fine) but then replaces that object with any other object that just has been parsed before....
### π Feature Native support for using FP16 GPU computations, via a flag to .learn or something like that. ### Motivation/ Pitch Using half precision instead of single is common...
### π Feature NatureCNN is hard coded for CombinedExtractor. https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/torch_layers.py#L258 ### Motivation Implementing different networks requires rewriting or patching CombinedExtractor, it's a lot of code. ### Pitch I would be...