axolotl
axolotl copied to clipboard
Is is possible to train 70b model on 8*A100 80G with full fine tunning?
What piece of documentation is affected?
I couldn't find any documentation related to this. Can anyone tell me if it's possible?
What part(s) of the article would you like to see updated?
I couldn't find any documentation related to this
Additional Information
No response
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.
I recall that you may be able to with deepspeed 3 and cpu offload
I recall that you may be able to with deepspeed 3 and cpu offload
Apologies for the confusion. I attempted to use deepspeed 3 with CPU offload, but the insufficient CPU memory caused issues. The node with 8*A100 has a total of 1024GB of CPU memory.
Have you already tried reducing the batch size and use 8bit optim?
The batch size set to 1 is not working. I haven't tried the 8-bit optimization. Will using 8-bit affect the quality of the trained model?
yeah, 8bit optimizers work well with deepspeed for finetuning.