NanoCode012
Results
163
comments of
NanoCode012
I recall that you may be able to with deepspeed 3 and cpu offload
Have you already tried reducing the batch size and use 8bit optim?
Is there a guideline on how much we should prune by? What are the benefits to doing this?