axolotl Is is possible to train 70b model on 8*A100 80G with full fine tunning?

Is is possible to train 70b model on 8*A100 80G with full fine tunning?

Open jaywongs opened this issue 1 year ago • 5 comments

What piece of documentation is affected?

I couldn't find any documentation related to this. Can anyone tell me if it's possible?

What part(s) of the article would you like to see updated?

I couldn't find any documentation related to this

Additional Information

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

Mar 25 '24 07:03 jaywongs

I recall that you may be able to with deepspeed 3 and cpu offload

Mar 26 '24 01:03 NanoCode012

I recall that you may be able to with deepspeed 3 and cpu offload

Apologies for the confusion. I attempted to use deepspeed 3 with CPU offload, but the insufficient CPU memory caused issues. The node with 8*A100 has a total of 1024GB of CPU memory.

Mar 26 '24 02:03 jaywongs

Have you already tried reducing the batch size and use 8bit optim?

Mar 26 '24 02:03 NanoCode012

The batch size set to 1 is not working. I haven't tried the 8-bit optimization. Will using 8-bit affect the quality of the trained model?

Mar 26 '24 02:03 jaywongs

yeah, 8bit optimizers work well with deepspeed for finetuning.

Mar 27 '24 14:03 winglian

axolotl axolotl copied to clipboard

Is is possible to train 70b model on 8*A100 80G with full fine tunning?

What piece of documentation is affected?

What part(s) of the article would you like to see updated?

Additional Information

Acknowledgements

axolotl
axolotl copied to clipboard