Rousslan F.J. Dossa

Results 20 comments of Rousslan F.J. Dossa

@timoklein Here is a snippet that would address @araffin's comments: https://github.com/vwxyzjn/cleanrl/pull/270#discussion_r1031332675 > why do you keep dim here as you flatten it in the next line? https://github.com/vwxyzjn/cleanrl/pull/270#discussion_r1031337608 > it's a...

Indeed. Unlike the sac continuous, the output of the policy ia not fed through the Q functions for the actor loss, so joint loss optimization will not be required. ________________________________...

> eps=1e-4 for Adam is required. Without this, there are seeds where SAC-d doesn't learn at all on Pong. This setting is also used in the [author's codebase](https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC.py). Has cost...

> It's a bit hand-wavy, but my best explanation is that not using up-to-date values will result in steps that are slightly off. Over time, these errors accumulate and throw...

@Chulabhaya Original SAC implementation was developed for continuous action case, while this one is for discrete action, hence the difference in computing target entropy. Furthermore, in both case target entropy...

@timoklein Thanks a lot for the detailed experiments. Indeed, running experiment with more than one seeds is critical. Sometimes, some seed can lead to "degenerate" results that are not really...

Greetings. Sorry for the late answer. In the original implementation, the `mean` is used for deterministic evaluation of the agent. Intuitively, using `mean` corresponds to the greediest policy, and would...

Hello there. Not sure if you are still looking, for a fix, or if someone else has the same problem, the window poped up after trying to run `python oculus_reader/readter.py`

Hello again. In case someone is looking, a workaround would be to execute `git config --global ...` instead. Might not be as secure, but should get it working as it...

Hello there. Thanks a lot for the interest ! It's been a while, but as far as I remember, the Torcs simulator binary that is used along with this wrapper...