waelrabah11
Results
1
comments of
waelrabah11
> I have the same issue when train mixtral 7bx8 with transformers 4.36 and deepseed 0.12.4(0.12.3) zero3 with gradient_checkpointing enable . it hangs after around 1:30 hours traning. @dumpmemory I...