Adam Gleave
Adam Gleave
Well spotted @qxcv! Could we fix this by calling `_setup_learn` with the true total timesteps before calling `learn`? It's a private method so that's not great but it might be...
I'm still a bit backlogged, @Rocamonde could you review this please?
Yeah I agree we should either specify the "inner" batch size to execute on GPU, or the number of accumulation batches. Computing each pair of trajectory fragments on the GPU...
I suspect this issue isn't specific to preference comparisons -- I think it'd also happen in AIRL/GAIL for example. Really anywhere we might want large batch sizes. Worth verifying the...
I'd like to understand why the tests are timing out in the first place. 120s is a long time for a unit test to be running! If the notebooks really...
Given MacOS executor is medium and others are x-large it's not that shocking it takes longer. Unfortunately they only go up to `large` for MacOS and that's not available to...
Mysterious! It could be some interaction between different tests, either because of resource contention (worth checking the memory logs) or some kind of shared mutex (can't think what, but sometimes...
Heads up we'll probably have some conflicts once https://github.com/HumanCompatibleAI/imitation/pull/460 gets merged, although shouldn't be too hard to resolve.
I've changed the base to `reward_ensemble` so the diff is cleaner. Once `reward_ensemble` gets merged we can rebase to `master`. We shouldn't merge this PR before that point.
> I guess the uncertainty came from the following points: 1) I think it's fine to have adaptive regularization take in only validation and training loss. We can add new...