Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

[Bug] DQN Exploration divides by 0 when learn steps are small

Hello, This seems to be an issue very specific to a given problem. I would advise you to derive a `CustomDQN` from `DQN` to fit your needs (you could also...

[Bug] DQN Exploration divides by 0 when learn steps are small

>the callback doesn't have access to for DQN, it does through `self.locals`, for others, you need to wrap it.

Adding Additional Observations and Actions to Buffer per TimeStep

Hello, It sounds like a callback should be the right solution (once #787 is merged) as you have access to `self.model`, you can call `self.model.replay_buffer.add()` inside the callback. In fact,...

add HER support for MultiDiscrete obs spaces

Hello, >Is it planned to add Multidiscrete obs a spaces since DQN support them? This is not planned as we are focusing on the v3.0 for now (and avoid adding...

Is it possible to save trained model as TF saved_model format? If so how?

Hello, Did you try taking at look at the doc on [exporting models](https://stable-baselines.readthedocs.io/en/master/guide/export.html)? Btw, if you succeed, we would appreciate a PR that documents how to do it ;)

Is it possible to save trained model as TF saved_model format? If so how?

Looking at the policy (in `common/` folder), this should be: `'action': model.act_model._policy_proba` (cf https://github.com/hill-a/stable-baselines/issues/474) which corresponds to the output of the policy. `action_ph` is used for training (it is a...

Antonin RAFFIN

[Bug] DQN Exploration divides by 0 when learn steps are small

[Bug] DQN Exploration divides by 0 when learn steps are small

Adding Additional Observations and Actions to Buffer per TimeStep

add HER support for MultiDiscrete obs spaces

Is it possible to save trained model as TF saved_model format? If so how?

Is it possible to save trained model as TF saved_model format? If so how?

[feature request] Add maximum time steps parameter to evaluation function to protect against infinite episodes

[question] Questions about MlpLstmPolicy

[question] Questions about MlpLstmPolicy

[question] Questions about MlpLstmPolicy