astooke comments

Results 80 comments of


                                            astooke

KeyError: 'action'

Yes, sorry this is an uninformative error I sometimes run into. It usually means there is something wrong inside the subprocess which is generating the example action, observation, etc... One...

Continue training

Hi, and thanks! :) It should all be in `params.pkl`, which will have the agent state dict, inside of which is the model state dict. (To save that, set the...

Continue training

Ohh, good point! The epsilon might not be loaded...an oversight on my part. You could save epsilon to the agent's state dict and load that back up. Or just when...

Windows Requires Pytorch >= 1.3

OK good to note! I'll try running everything again with the latest PyTorch and then update the conda env yaml if everything runs. I'm only in Ubuntu tho.

Normalizing environment wrapper

Nice, note, thanks for that! In a related note, we've recently pushed an update which includes observation normalization in the policy gradient algorithms: Commit 98fefa2d8550bddbdff8f44004062dc5e72bf56b So a question for reward...

Normalizing environment wrapper

@vzhuang Curious if you ended up getting this running? Definitely a useful piece to add. :)

Support for weird graph-based observation data type

Hi, interesting question! One challenge to this is that memory is pre-allocated for the observations and actions, according to the sampler batch size. So it can't have variable-sized observations or...

Support for weird graph-based observation data type

> where is this memory pre-allocated and where could I modify it? Sure! Here it is in the serial sampler: https://github.com/astooke/rlpyt/blob/75e96cda433626868fd2a30058be67b99bbad810/rlpyt/samplers/serial/sampler.py#L36 Otherwise look for `build_samples_buffer()` and see inside that. >...

Support for weird graph-based observation data type

@tarungog Hi! Curious if you pursued anything for variable-sized observations and actions?

Error on running GpuSampler/CpuSampler

Hmm, I'm not familiar with that error message. Are the worker processes even able to fork?