Joseph Bloom

Results 41 issues of Joseph Bloom

Similar to the Maze experiments done by the shard theory team, calculate a vector corresponding to the key/ball in the memory env (or make a more general tool) that can...

The app likely doesn't work with behavioural clone models and this should be fixed so that we can do a comparison.

I attempted this but weirdly got a clash with black even if profile = black with isort.

Our PPO models are now stored with checkpoints but our offline trained models aren't. Creating some parity here would be good. Please ensure: - [ ] It remains easy to...

help wanted

This is an important task I wouldn't want to do if there was lots of other work in flight or before I had some more results. However, I wanted to...

Currently `store_model_checkpoint` will add files to be uploaded to wandb to an artifact which is uploaded at the end of the ppo training cycle. It would be better if these...

good first issue

The end-end tests show how to pass arguments to the different PPO models (FC, Transformer, LSTM) but it might nice to have a command line tool that handles all of...

We currently have 5 probe environments for single timestep models and I'd like a prob environment to test if a model can learn: 1. to take the correct action as...

**more details to add later or message me** Might be valuable for training performance.

good first issue