Shrikar Archak

Results 3 comments of Shrikar Archak

Tried with the least possible configs. ```batch_size = 16 / devices micro_batch_size = 1 # set to 2 because this is fit into 12GB Vram gradient_accumulation_iters = batch_size // micro_batch_size```

Hitting the same issue any plans on merging this?