axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Is is possible to train 70b model on 8*A100 80G with full fine tunning?

Open jaywongs opened this issue 1 year ago • 5 comments

What piece of documentation is affected?

I couldn't find any documentation related to this. Can anyone tell me if it's possible?

What part(s) of the article would you like to see updated?

I couldn't find any documentation related to this

Additional Information

No response

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

jaywongs avatar Mar 25 '24 07:03 jaywongs

I recall that you may be able to with deepspeed 3 and cpu offload

NanoCode012 avatar Mar 26 '24 01:03 NanoCode012

I recall that you may be able to with deepspeed 3 and cpu offload

Apologies for the confusion. I attempted to use deepspeed 3 with CPU offload, but the insufficient CPU memory caused issues. The node with 8*A100 has a total of 1024GB of CPU memory.

jaywongs avatar Mar 26 '24 02:03 jaywongs

Have you already tried reducing the batch size and use 8bit optim?

NanoCode012 avatar Mar 26 '24 02:03 NanoCode012

The batch size set to 1 is not working. I haven't tried the 8-bit optimization. Will using 8-bit affect the quality of the trained model?

jaywongs avatar Mar 26 '24 02:03 jaywongs

yeah, 8bit optimizers work well with deepspeed for finetuning.

winglian avatar Mar 27 '24 14:03 winglian