astooke
astooke
ok this is pretty interesting! could you please explain. what algorithm? 3x convergence speed in wall clock time or environment samples? what environment? what agent network? what kind of computer?...
Hi, sorry for the slow reply! This looks the same as #99, which talks about using spawn instead of fork, and it’s probably a problem with the namedarraytuple types. Currently...
Hey @MaxASchwarzer, just pushed a bunch of work to master which introduces the NamedArrayTuple which should work with spawn as an alternative to namedarraytuple. Want to give that a try...
Almost! Instead of `BufferSamples = NamedArrayTuple(...)`, use: `BufferSamples = NamedArrayTupleSchema(...)` Then when you call `x = BufferSamples(...)`, `x` will be an instance of `NamedArrayTuple`. I can see how that's kind...
@mxwells Oops somehow I missed your last question for a bit. That is a very slow reset! First thing to do would be to use any of the `WaitReset` collectors....
Great! One thing to watch out for, if you're storing samples to a replay buffer and training from there, the usual uniform-sampling from replay might return invalid samples if you...
Good question, it definitely seems worth including, but my immediate to-do list is already full with other things. Can see if I can find someone to code it up, or...
Great! Right, that is all correct about buffers + async, etc. It seems PPO is able to handle being slightly off-policy, and could be worth a try in the async...
This one is a bit too major of a change, I think we'll keep it as RL iterations, but I do usually end up plotting the x-axis as the logged...
Oooh that's not ideal. Let's see if we can speed this up. Did you profile with `CUDA_LAUNCH_BLOCKING=1`? Without that, some GPU time is not always accounted correctly, bc it runs...