Johnny He

Results 4 issues of Johnny He

Following readme.md When I run ``` python -m qmap.train_mario --render ``` The result is: ``` Traceback (most recent call last): File "/home/mirror/anaconda3/envs/qmap/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/mirror/anaconda3/envs/qmap/lib/python3.6/runpy.py",...

In the original SAC paper, the J_pi is: ![image](https://user-images.githubusercontent.com/17249055/53928588-a4da0f80-40c5-11e9-95b5-8edcce72588c.png) However, in your implementation, your code is: ``` policy_loss = (log_prob * (log_prob - log_prob_target).detach()).mean() mean_loss = mean_lambda * mean.pow(2).mean() std_loss...

I notice that some environments in the DMControl suite may take more than one CPU thread. For example, if we choose Domain_name=finger, Task_name=spin. I run this task in a server...

Hi, Kumar! In the last [issue](https://github.com/aviralkumar2907/BEAR/issues/6), you mentioned that you don't test BEAR on the final buffer setting and recommend me using d4rl datasets. Following your comments, I use d4rl...