Results 6 comments of Pranjal Tandon

Hmm..... Don't know why this would happen, although I have never tested on humainoid for 10 million steps. The result 8000, on humanoid, is for learned temperature (alpha). For fixed...

I just ran Humanoid for 10 million steps, and unfortunately cannot reproduce the problems you're observing. Here are the results I see across 2 seeds: ![Screenshot from 2019-09-22 12-07-24](https://user-images.githubusercontent.com/18737539/65383351-fd0af300-dd31-11e9-8824-b0c6e36073a8.png) Maybe...

> For `--automatic_entropy_tuning = False` 6000 is the expected result. For fixed temperature, the results should be around 6000. You can also check this in the [paper](https://arxiv.org/pdf/1812.05905.pdf)

That shouldn't happen. Will look into it. I might, require more detail on how you resume training. (Sorry for the late reply.)

It is working Git clone again and try

Sure, it should work custom gym envs. It won't work if the env has discrete actions.