Polina Kirichenko

Results 1 comments of Polina Kirichenko

Hi! I think lowering the learning rate (to at least 1e-5) and using Adam or AdamW optimizers with increased weight decay should improve training stability.