Kristian Hartikainen

Results 47 comments of Kristian Hartikainen

You're right, SAC currently work only for single agent training. Unfortunately, we don't currently have plans to support distributed training at least in the near future. I'm planning to contribute...

Hey @ivan-ji-walmart, thanks for opening this issue! Sorry about this. Some of my recent changes must have broken the SQL implementation and has left unnoticed since I haven't been using...

> Yet I am running this on a GPU cluster (operating on CentOS) and it is non-trivial to upgrade the lib C (e.g. upgrading may affect other users, need to...

Hey @araffin, thanks for opening this issue! We've actually observed very similar reward-related problems with SAC recently. I don't remember ever running `MountainCarContinuous-v0` myself, so I can't say whether I...

Thanks for opening the issue, and sorry for the delayed reply. I think you're right, the way we set the seeds in our code does not guarantee full reproducibility but...

I think @AlphaFrank is right, SQL currently uses the default `GaussianPolicy`. This is my bad. The policy should be changed to the `StochasticNNPolicy` that we used to have in our...

@aviralkumar2907, I'd be happy to take a PR for this after I get the tf2 stuff out to the master 😉

I had the same issue and tracked it down to the custom joblib.pool: https://github.com/chrodan/joblib/blob/01611a53c2f2783ef80530bbcc146ef804f25e76/joblib/pool.py#L54. I managed to get around this by replacing the `MemmapingPool` with `multiprocessing.pool.Pool` in the custom joblib...

Hey @araffin. The iteration is intentionally constant for all the gradient steps in softlearning code. Basically the logic currently checks if we should train on this `iteration` and if so,...

Ah I see, yeah. The target update in my opinion should be dependent of the gradient updates so probably preferable to do what TD3 does.