Kristian Hartikainen comments

Results 47 comments of


                                            Kristian Hartikainen

Parallelization

You're right, SAC currently work only for single agent training. Unfortunately, we don't currently have plans to support distributed training at least in the near future. I'm planning to contribute...

SQL algorithm is not working

Hey @ivan-ji-walmart, thanks for opening this issue! Sorry about this. Some of my recent changes must have broken the SQL implementation and has left unnoticed since I haven't been using...

GLIBC version requirements: /lib64/libm.so.6: version `GLIBC_2.23' not found

> Yet I am running this on a GPU cluster (operating on CentOS) and it is non-trivial to upgrade the lib C (e.g. upgrading may affect other users, need to...

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward

Hey @araffin, thanks for opening this issue! We've actually observed very similar reward-related problems with SAC recently. I don't remember ever running `MountainCarContinuous-v0` myself, so I can't say whether I...

Environment seeding doesn't allow reproducibility

Thanks for opening the issue, and sorry for the delayed reply. I think you're right, the way we set the seeds in our code does not guarantee full reproducibility but...

What does it mean to set 'eval_deterministic' = True for SQL?

I think @AlphaFrank is right, SQL currently uses the default `GaussianPolicy`. This is my bad. The policy should be changed to the `StochasticNNPolicy` that we used to have in our...

Are there plans to integrate DisCor

@aviralkumar2907, I'd be happy to take a PR for this after I get the tf2 stuff out to the master 😉

'numpy.ndarray' object has no attribute 'mode'

I had the same issue and tracked it down to the custom joblib.pool: https://github.com/chrodan/joblib/blob/01611a53c2f2783ef80530bbcc146ef804f25e76/joblib/pool.py#L54. I managed to get around this by replacing the `MemmapingPool` with `multiprocessing.pool.Pool` in the custom joblib...

[question] SAC target q nets may be updated many times in each round?

Hey @araffin. The iteration is intentionally constant for all the gradient steps in softlearning code. Basically the logic currently checks if we should train on this `iteration` and if so,...

[question] SAC target q nets may be updated many times in each round?

Ah I see, yeah. The target update in my opinion should be dependent of the gradient updates so probably preferable to do what TD3 does.