sac icon indicating copy to clipboard operation
sac copied to clipboard

Soft Actor-Critic

Results 11 sac issues
Sort by recently updated
recently updated
newest added

Hi, First, thanks for sharing the repo. I am really confused by the performance comparison between SAC and TD3. In TD3's results, TD3 beats SAC in every environment evaluated with...

Hi! I am currently trying to verify my DIAYN implementation and I was wondering if there are any additional results available that are not provided within the [original paper](https://arxiv.org/abs/1802.06070) or...

Hi, thanks for the thorough implementation and making this code available, it really helps to understand the internal mechanisms of the SAC algorithm. I have a question regarding the code...

Hello, I'm not sure whether this is an issue or not but I've been looking at your implementation for half an hour, and I think there might be a maximization...

I derived Equation 12, but the result is not the same as Equation 13 in your paper. In my derivation, I didn't get the first item in Equation 13, I...

Traceback (most recent call last): File "/home/xtq/sac/examples/mujoco_all_sac.py", line 15, in from sac.algos import SAC File "/home/xtq/sac/sac/algos/__init__.py", line 2, in from .diayn import DIAYN File "/home/xtq/sac/sac/algos/diayn.py", line 10, in from sac.policies.hierarchical_policy...

when executing `docker-compose up` from my Macbook Pro, DockerFile would fail on `step 26/35` for non satisfied libs for `numpy=1.13.0` dep. (blas,mkl etc.) so I made some modifications to make...

Hi, Can SAC run with the cross-maze variation for ant? With the default parameters, the command: "python ./examples/mujoco_all_sac.py --env=ant --domain=ant --task=cross-maze --policy=gmm --log_dir=data/ant_cross-experiment" does not throw any errors but "Launches...

GaussianPolicy inherits from NNPolicy and in the end of GaussianPolicy constructor (__init__) there is a call to the parent of NNPolicy... shouldnt it be : `super(GaussianPolicy,self).__init__(env_spec)` I observed its repeating...

The Soft Actor-Critic paper ([arXiv v2](https://arxiv.org/abs/1801.01290)) says, in the last paragraph on page 5: > We then use the minimum of the Q-functions for the value gradient in Equation 6...