Gregory Polyakov

Results 1 comments of Gregory Polyakov

@rodrigonogueira4 Thanks for your response I tried the hyperparams you suggested: --train_batch_size=4 --accumulate_grad_batches=32 --optimizer=AdamW --lr=3e-5 --weight_decay=5e-5 And so far, the closest result was obtained by training mono-t5 for 9k steps...