Adam Gleave
Adam Gleave
This has been added now.
Closing as documented in #603
I believe outstanding flaky tests have been addressed. Please open new issue for any specific flakiness discovered.
> The rewards would only be relevant if we were to continue training with our `train_rl.py` script which is not one of our use-cases I guess. Can you confirm this...
Fixed in #610
> I re-trained the experts for all the above mentioned envs (PPO and SAC where applicable). Thanks for resolving this Max!
I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py
Yeah it's still in the commit history. https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents
> @AdamGleave > I believe this code is from your side. Any thoughts on skipping init if model was already initialized, or should we prevent/warn about using `pretrain` after `train`?...
Yeah we've done something similar, the most relevant class is [CurryVecEnv](https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/training/embedded_agents.py#L6)