Antonin RAFFIN
Antonin RAFFIN
> Maybe right now modifying reset function is the only method. But it would be better for stable baseline to have the parameter that I can specify how to sample...
hello, please don't forget about alternatives too (you ticked the box).
if you want the correct hyperparameters for Atari, you should use the RL Zoo. The example in the doc is there to show the api, we kept it concise to...
Hello, you should use the rl zoo and save/load the replay buffer too. Probably a duplicate of https://github.com/DLR-RM/stable-baselines3/issues/435 and others
> training the mean reward starts from the low initial value. you should probably set the learning starts (warmup) parameter to zero after loading. > What do you mean with...
Hello, in your case, the best is to fork sb3 and adapt the rollout buffer/ppo. This is too custom to be solved by callbacks or subclassing.
> you’re suggesting to augment the replay buffer to collect time-varying gammas with each rollout, then in the PPO loss function, use the gammas from the replay buffer? correct, that...
> Meaning I dont need to modify the input for the training update functions, you need to modify the named tuple that represent a transition and modify the GAE computation...
Hello, what have you tried so far? and what errors did you encounter? Please provide a minimal and working code example (see link in issue template for what that means).
I gave it a try but this one seems to be a bit hard, you probably need to use the experimental onnx export from pytorch (using dynamo). The thing that...