Tuomas Haarnoja
Tuomas Haarnoja
Hi Termset, it's this one: [https://github.com/rll/rllab/tree/master/sandbox](https://github.com/rll/rllab/tree/master/sandbox). Note that this repo is not actively maintained anymore. I recommend you to use [softlearning](https://github.com/rail-berkeley/softlearning) repo instead, which includes the most up-to-date version of...
Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.
Hi, we indeed use the same data to update both of the Q-functions. I haven't tested splitting the data and using different sets for different Q's, but I'm guessing that...
Thanks for your question. We use uniform sampling because there is no direct way to evaluate the log-probabilities of action of SVGD policies, which would be needed for the importance...
I see, that's indeed confusing. You are right in that we could compute the log probs if the sampling network is invertible. My feeling is that, in our case, the...
Do you mean the expectation over states and actions in Eq. (11)? It is OK, since the corresponding gradient estimator is unbiased, though it can have high variance.