Edan Toledo

Results 81 comments of Edan Toledo

Amazing! This should be quite simple. The `timestep.last()` function checks if the current time step is the last one. In the current auto-reset API, when an episode finishes and a...

no prob. let me just show how the auto-reset API works with an example to give you a better idea. Imagine a 3-timestep environment where each observation (obs) is simply...

I'm just leaving a checklist here of things that need to be done: - [ ] Explicit catering of batch dimension i.e. not relying on flax.nn.vmap - [x] When feeding...

Additionally, I've officially merged the popgym PR so now we can test on popgym envs easily when we feel ready.

hmmm i see, could we implement the squeeze unsqueeze logic only in the outermost architecture thus still allowing the cell to be run on its own. So basically something like:...

@smorad I've now added the explicit expectation of a batch dimension - the network now works with rec_ppo.py natively simply by changing the network conf. For example we can do...

Amazing! Thanks so much. Shall we make a checklist like the other PR and then i can help with populating some of the areas.

I like what you've currently written and only extra things would be similar to cleanrl's Advanced section so: 1. Hyperparameter Tuning 2. Resuming Training and Checkpointing I also would like...

I'm opening this up again and will start contributing to the docs. I will get this PR in!