Adam Gleave comments

Results 172 comments of


                                            Adam Gleave

Consider makefile and pre-commit utilities for CI and local development

This has been added now.

Infinite-Horizon Environments not Supported

Closing as documented in #603

Review and fix flaky tests

I believe outstanding flaky tests have been addressed. Please open new issue for any specific flakiness discovered.

HuggingFace models outdated

> The rewards would only be relevant if we were to continue training with our `train_rl.py` script which is not one of our use-cases I guess. Can you confirm this...

Weird behavior training CNN policies with train_rl

Fixed in #610

HuggingFace models outdated

> I re-trained the experts for all the above mentioned envs (PPO and SAC where applicable). Thanks for resolving this Max!

[Question] How best to implement self-play/multiple agents in the same environment?

I never finished the self-play implementation but it might still be worth looking at:https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/agents/ppo_self_play.py

[Question] How best to implement self-play/multiple agents in the same environment?

Yeah it's still in the commit history. https://github.com/HumanCompatibleAI/adversarial-policies/tree/99700aab22f99f8353dc74b0ddaf8e5861ff34a5/src/aprl/agents

[question] PPO2 pretrain always resets weights?

> @AdamGleave > I believe this code is from your side. Any thoughts on skipping init if model was already initialized, or should we prevent/warn about using `pretrain` after `train`?...

Using Saved Model as Enemy Policy in Custom Environment (while training in a subprocvecenv)

Yeah we've done something similar, the most relevant class is [CurryVecEnv](https://github.com/HumanCompatibleAI/adversarial-policies/blob/master/src/aprl/training/embedded_agents.py#L6)