Gaon An

Results 5 comments of Gaon An

Thank you for the response! I'll ignore this dataset for right now. Just to let you know, below is the training curve for Asterix/1 and Asterix/2: Thank you!

Hello, Any updates on this issue? I tried using @olliejday's solution, but the results are different in some environments. For example, on hopper-expert-v0 with policy_lr=1e-4, min_q_weight=5.0, and langrange_thresh=-1.0, the average...

Adding `.detach()` to the outputs of `_get_policy_actions()` and switching the update order of the policy network and the q-function networks seem to solve the issue (Tested in torch=1.7).

@sweetice Hi, I did test the modified version against the original version (which was ran on torch == 1.4), and the two versions had similar performances on d4rl datasets. I...

@Zhendong-Wang I found policy_lr=1e-4, min_q_weight=10.0, lagrange_thresh=-1.0 to work fairly well on most of the gym environments, though I used '*-v2' datasets. Exceptionally, for 'halfcheetah-random-v2', policy_lr=1e-4, min_q_weight=1.0, lagrange_thresh=10.0 works well. If...