Antonin RAFFIN

Results 880 comments of Antonin RAFFIN

nevermind, I did some more systematic tests and I couldn't see any significant difference, the implementation looks good =) Report: https://wandb.ai/openrlbenchmark/sb3-contrib/reports/SB3-Contrib-CrossQ--Vmlldzo4NTE2MTEx

> let me know if you need anything else :) sure, I need to find some time to go over it and maybe polish things here and there. I will...

I simplified the network creation (need https://github.com/DLR-RM/stable-baselines3/pull/1975 to be merged with master), added the updated beta for Adam (it had an impact on my small experiments with Pendulum) and fixed...

Hello, I don't have much to say, the discrete solution looks like a good option. Another one would be to modify the distribution used by SB3. Otherwise, maybe @vwxyzjn has...

Related: https://github.com/hill-a/stable-baselines/issues/1012#issuecomment-704534519 "If you (or others) want to support and maintain a TF2 version, we would be also happy about it. But I have to warn you that it is...

Good news =) >Loading / running the trained model is not working wel What do you mean by "not working well"? Training the RL zoo (https://github.com/araffin/rl-baselines-zoo, so 70+ agents), I...

Perfect, I think i will link your repo once the new results are published ;)

@Sohojoe thanks for the clarification =) >It looks like you are further ahead with discrete control vs continuous controls. That's true, most of the algorithms were implemented for atari only...

> I've been working on folding this and other experimental code back into Marathon Environments Cool! Btw, we recently [recently released](https://github.com/hill-a/stable-baselines/releases) v2.4.0 that ships with Soft Actor-Critic (SAC) and policy...

PS: here is the result using the trained model and deterministic actions: ![ppo_deterministic](https://user-images.githubusercontent.com/1973948/75608037-53a80480-5afc-11ea-978b-e0ee636fa1a3.png) ``` >print(np.sqrt(s)) 9.9 ```