Antonin RAFFIN comments

Results 880 comments of


                                            Antonin RAFFIN

Implemented CrossQ

nevermind, I did some more systematic tests and I couldn't see any significant difference, the implementation looks good =) Report: https://wandb.ai/openrlbenchmark/sb3-contrib/reports/SB3-Contrib-CrossQ--Vmlldzo4NTE2MTEx

Implemented CrossQ

> let me know if you need anything else :) sure, I need to find some time to go over it and maybe polish things here and there. I will...

Implemented CrossQ

I simplified the network creation (need https://github.com/DLR-RM/stable-baselines3/pull/1975 to be merged with master), added the updated beta for Adam (it had an impact on my small experiments with Pendulum) and fixed...

Dependent Actions in MultiDiscrete Action Space

Hello, I don't have much to say, the discrete solution looks like a good option. Another one would be to modify the distribution used by SB3. Otherwise, maybe @vwxyzjn has...

Could not find a version that satisfies the requirement tensorflow<1.16,>=1.15

Related: https://github.com/hill-a/stable-baselines/issues/1012#issuecomment-704534519 "If you (or others) want to support and maintain a TF2 version, we would be also happy about it. But I have to warn you that it is...

Bad perfomances with ppo stable-baselines

Good news =) >Loading / running the trained model is not working wel What do you mean by "not working well"? Training the RL zoo (https://github.com/araffin/rl-baselines-zoo, so 70+ agents), I...

Bad perfomances with ppo stable-baselines

Perfect, I think i will link your repo once the new results are published ;)

Bad perfomances with ppo stable-baselines

@Sohojoe thanks for the clarification =) >It looks like you are further ahead with discrete control vs continuous controls. That's true, most of the algorithms were implemented for atari only...

Bad perfomances with ppo stable-baselines

> I've been working on folding this and other experimental code back into Marathon Environments Cool! Btw, we recently [recently released](https://github.com/hill-a/stable-baselines/releases) v2.4.0 that ships with Soft Actor-Critic (SAC) and policy...

Project Page Stable Baselines

PS: here is the result using the trained model and deterministic actions: ![ppo_deterministic](https://user-images.githubusercontent.com/1973948/75608037-53a80480-5afc-11ea-978b-e0ee636fa1a3.png) ``` >print(np.sqrt(s)) 9.9 ```