Antonin RAFFIN
Antonin RAFFIN
nevermind, I did some more systematic tests and I couldn't see any significant difference, the implementation looks good =) Report: https://wandb.ai/openrlbenchmark/sb3-contrib/reports/SB3-Contrib-CrossQ--Vmlldzo4NTE2MTEx
> let me know if you need anything else :) sure, I need to find some time to go over it and maybe polish things here and there. I will...
I simplified the network creation (need https://github.com/DLR-RM/stable-baselines3/pull/1975 to be merged with master), added the updated beta for Adam (it had an impact on my small experiments with Pendulum) and fixed...
Hello, I don't have much to say, the discrete solution looks like a good option. Another one would be to modify the distribution used by SB3. Otherwise, maybe @vwxyzjn has...
Related: https://github.com/hill-a/stable-baselines/issues/1012#issuecomment-704534519 "If you (or others) want to support and maintain a TF2 version, we would be also happy about it. But I have to warn you that it is...
Good news =) >Loading / running the trained model is not working wel What do you mean by "not working well"? Training the RL zoo (https://github.com/araffin/rl-baselines-zoo, so 70+ agents), I...
Perfect, I think i will link your repo once the new results are published ;)
@Sohojoe thanks for the clarification =) >It looks like you are further ahead with discrete control vs continuous controls. That's true, most of the algorithms were implemented for atari only...
> I've been working on folding this and other experimental code back into Marathon Environments Cool! Btw, we recently [recently released](https://github.com/hill-a/stable-baselines/releases) v2.4.0 that ships with Soft Actor-Critic (SAC) and policy...
PS: here is the result using the trained model and deterministic actions:  ``` >print(np.sqrt(s)) 9.9 ```