interpretable_predictions
interpretable_predictions copied to clipboard
Separate eval from train for lambda_i and ci_ma
- only update lambda during training
- use separate moving-averages for
train
vs.eval
(similar to batch-norm I guess?)
I find 1 to be crucial for stabilizing the training under GECO - without it I can only train the "latent" model with the default batch size (i.e. 256) regardless of how hard I tune the learning rate. Yet with it fixed, I've managed to train with much larger batch sizes, e.g. 1024.
(edit: correct legend color)
batch_size=256 (default) | batch_size=1024 (4x) |
---|---|
![]() |
![]() |
(no code-change in the above force-push, only switching user.email
to match my github account)