Fyeward

Results 1 issues of Fyeward

I'm training using eight 64GB NPUs with `num_generations=8`. I found that when I set A: `per_device_train_batch_size=8` and `gradient_accumulation_steps=1`, and B: `per_device_train_batch_size=4` and `gradient_accumulation_steps=2`, there wasn't a significant difference in memory...

🐛 bug
⚡accelerate