Adam Gleave comments

Results 172 comments of


                                            Adam Gleave

AdversarialTrainer: Silent incompatibility with SB3 learning rate schedules

Well spotted @qxcv! Could we fix this by calling `_setup_learn` with the true total timesteps before calling `learn`? It's a private method so that's not great but it might be...

add video saving and uploading support to `train_*` scripts

I'm still a bit backlogged, @Rocamonde could you review this please?

[Feature request] Gradient accumulation in CrossEntropyRewardTrainer in preference_comparisons.py

Yeah I agree we should either specify the "inner" batch size to execute on GPU, or the number of accumulation batches. Computing each pair of trajectory fragments on the GPU...

[Feature request] Gradient accumulation in CrossEntropyRewardTrainer in preference_comparisons.py

I suspect this issue isn't specific to preference comparisons -- I think it'd also happen in AIRL/GAIL for example. Really anywhere we might want large batch sizes. Worth verifying the...

Test examples sometimes timeout

I'd like to understand why the tests are timing out in the first place. 120s is a long time for a unit test to be running! If the notebooks really...

Test examples sometimes timeout

Given MacOS executor is medium and others are x-large it's not that shocking it takes longer. Unfortunately they only go up to `large` for MacOS and that's not available to...

Test examples sometimes timeout

Mysterious! It could be some interaction between different tests, either because of resource contention (worth checking the memory logs) or some kind of shared mutex (can't think what, but sometimes...

Dynamic L2 regularization for preference comparisons

Heads up we'll probably have some conflicts once https://github.com/HumanCompatibleAI/imitation/pull/460 gets merged, although shouldn't be too hard to resolve.

Dynamic L2 regularization for preference comparisons

I've changed the base to `reward_ensemble` so the diff is cleaner. Once `reward_ensemble` gets merged we can rebase to `master`. We shouldn't merge this PR before that point.

Dynamic L2 regularization for preference comparisons

> I guess the uncertainty came from the following points: 1) I think it's fine to have adaptive regularization take in only validation and training loss. We can add new...