astooke comments

Results 80 comments of


                                            astooke

delayed update sampler_model get better result?

ok this is pretty interesting! could you please explain. what algorithm? 3x convergence speed in wall clock time or environment samples? what environment? what agent network? what kind of computer?...

Correct way to send AsyncReplayBuffers to new processes

Hi, sorry for the slow reply! This looks the same as #99, which talks about using spawn instead of fork, and it’s probably a problem with the namedarraytuple types. Currently...

Correct way to send AsyncReplayBuffers to new processes

Hey @MaxASchwarzer, just pushed a bunch of work to master which introduces the NamedArrayTuple which should work with spawn as an alternative to namedarraytuple. Want to give that a try...

Correct way to send AsyncReplayBuffers to new processes

Almost! Instead of `BufferSamples = NamedArrayTuple(...)`, use: `BufferSamples = NamedArrayTupleSchema(...)` Then when you call `x = BufferSamples(...)`, `x` will be an instance of `NamedArrayTuple`. I can see how that's kind...

Correct way to send AsyncReplayBuffers to new processes

@mxwells Oops somehow I missed your last question for a bit. That is a very slow reset! First thing to do would be to use any of the `WaitReset` collectors....

Correct way to send AsyncReplayBuffers to new processes

Great! One thing to watch out for, if you're storing samples to a replay buffer and training from there, the usual uniform-sampling from replay might return invalid samples if you...

Distributional DDPG Progress?

Good question, it definitely seems worth including, but my immediate to-do list is already full with other things. Can see if I can find someone to code it up, or...

Distributional DDPG Progress?

Great! Right, that is all correct about buffers + async, etc. It seems PPO is able to handle being slightly off-policy, and could be worth a try in the async...

use cumulative environment steps as step in tensorboard

This one is a bit too major of a change, I think we'll keep it as RL iterations, but I do usually end up plotting the x-axis as the logged...

Sequence Buffer Sampling Performance

Oooh that's not ideal. Let's see if we can speed this up. Did you profile with `CUDA_LAUNCH_BLOCKING=1`? Without that, some GPU time is not always accounted correctly, bc it runs...