NanoCode012

Results 163 comments of NanoCode012

I recall that you may be able to with deepspeed 3 and cpu offload

Have you already tried reducing the batch size and use 8bit optim?

Is there a guideline on how much we should prune by? What are the benefits to doing this?