Adam Gleave
Adam Gleave
> I have no idea what is going on here, I am getting errors on CircleCI that seem so weird and don't show up on my local machine. Before I...
Thanks for the updates! Moving `regularization/__init__.py` to some sub-module in `regularization` seems fine to me as an alternative to `util` (or `utils`...). Main question is whether we expect to add...
> I wonder what actually happens to the loss gradient when adding an L1 norm penalty, since it's not differentiable. Does pytorch compute subgradients? @AdamGleave Yeah, it uses subgradients at...
I agree 1) should be fixed, just making the deterministic policy consistent between both (likely defaulting to False) seems fine for now. For 2) I think we should test empirically...
I don't think we want PPO to be deterministic. If I understand correctly, rollouts collected for purpose of RL training will always need to be stochastic (this is where the...
> Since L2 regularization is not the same as weight decay, should we implement L2 penalty or a weight decay? Good question, unfortunately the answer seems a bit unclear. The...
> 1. If we only plan to support Adam as an optimizer, we can write a custom optimizer class that wraps Adam and 'cleans up' the hackiness and separates the...
Hi, Yes, contributions are welcome! Especially as the reference implementation looks to [not be free software](https://github.com/Div99/IQ-Learn), so having an open-source implementation of this would be valuable. Although this does mean...
I think prominently placed in the documentation should be sufficient.
Thanks for the PR! Changes look reasonable to me at a high level, my suggestions are fairly minor apart and largely to do with improving clarity. I'm tagging @levmckinney to...