Joseph Bloom
Joseph Bloom
Similar to the Maze experiments done by the shard theory team, calculate a vector corresponding to the key/ball in the memory env (or make a more general tool) that can...
See discussion in #43
The app likely doesn't work with behavioural clone models and this should be fixed so that we can do a comparison.
I attempted this but weirdly got a clash with black even if profile = black with isort.
Our PPO models are now stored with checkpoints but our offline trained models aren't. Creating some parity here would be good. Please ensure: - [ ] It remains easy to...
This is an important task I wouldn't want to do if there was lots of other work in flight or before I had some more results. However, I wanted to...
Currently `store_model_checkpoint` will add files to be uploaded to wandb to an artifact which is uploaded at the end of the ppo training cycle. It would be better if these...
The end-end tests show how to pass arguments to the different PPO models (FC, Transformer, LSTM) but it might nice to have a command line tool that handles all of...
We currently have 5 probe environments for single timestep models and I'd like a prob environment to test if a model can learn: 1. to take the correct action as...
**more details to add later or message me** Might be valuable for training performance.