Adam Gleave

Results 172 comments of Adam Gleave

Closing as documented in #603

I believe outstanding flaky tests have been addressed. Please open new issue for any specific flakiness discovered.

> The rewards would only be relevant if we were to continue training with our `train_rl.py` script which is not one of our use-cases I guess. Can you confirm this...

> I re-trained the experts for all the above mentioned envs (PPO and SAC where applicable). Thanks for resolving this Max!

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

Yeah it's still in the commit history. https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents

> @AdamGleave > I believe this code is from your side. Any thoughts on skipping init if model was already initialized, or should we prevent/warn about using `pretrain` after `train`?...

Yeah we've done something similar, the most relevant class is [CurryVecEnv](https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/training/embedded_agents.py#L6)