astooke comments

Results 80 comments of


                                            astooke

Doesn't work with (non-atari) env

@drozzy @LecJackS If you are making a custom env, it is better to just use the rlpyt base env class (https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/base.py), and follow that interface. No need to go through...

Doesn't work with (non-atari) env

@frankie4fingers that's an unexpected problem! Could you provide more details? The environment should be instantiated separately within each child process in the parallel samplers.

Doesn't work with (non-atari) env

@im-ant Yes the main thing to get C51 working with a custom environment is just to write your own model class. Or maybe your environment has the same observation and...

Doesn't work with (non-atari) env

@frankie4fingers OK thanks for explaining the problem and the quick workaround. I'm still a bit surprised by this, because I've run gym envs in parallel before. And when the child...

Batches with additional observations

Hmm, ok things get a little tricky with prioritized sequence replay. My first suggestion would be to construct the sequence replay buffer with its `batch_T` equal to the longest K...

Batches with additional observations

Wow, very sorry I missed your last question! I am surprised that the code would run so much slower. Possibly it was sampling huge batches, with a lot of extra...

log_std exploding in GaussianPgAgent (MujocoFfAgent)

Hmmm, I don't have a full answer for this because it's specifics of one RL problem...but one thing that might help is to clip the actions inside the environment, but...

Diagnostics/NewCompletedTrajs 0 on some iterations

Hmm, how long is your `config["runner"]["log_interval_steps"]` and how many environments are you using `config["sampler"]["batch_B"]`? If the first of those is small, and the second is large, it could simply be...

Unique distribution class

Hi, this is an interesting point that could make things easier. They were not intended to be related, but what functionality from PyTorch distributions are you looking for?

Why is .item() not called on grad norm like on other opt info fields?

Hi! This is because in PyTorch 1.2, the grad norm is returned as a python float. In later versions of PyTorch, it's a pytorch object which requires calling `.item()` to...