torchtune
torchtune copied to clipboard
Is there a plan to support AnswerDotAI/fsdp_qlora that fine-tune 70b LLM on 2x 24G GPUs (like: RTX 3090)?
https://github.com/AnswerDotAI/fsdp_qlora
Hi, I believe this recipe has similar functionality to what you're asking for: https://github.com/pytorch/torchtune/blob/main/recipes/lora_finetune_distributed.py
You will just need to figure out how to pass in the proper config, for which there is a good amount of documentation.
Thanks for creating the issue! Currently our QLoRA configs are only meant for single-device, but extending them to support multiple devices with FSDP is high on our list of priorities. cc @rohan-varma who can give more details on this.
Thanks for the comments! As @ebsmothers mentioned, we only support QLoRA on a single GPU at the moment. Our current offering of FSDP and QLoRA don't compose with each other due to some technical limitations at the moment.
Over the medium/long-term, we'll be working directly with the PyTorch core team (folks like @awgu) to help build and test out FSDP v2 which should have much better composability with techniques like QLoRA and quantization APIs in general. Stay tuned for concrete plans very soon!
Update: @weifengpy has been working on a PR #909 to do this!
@calmitchell617 I've done fine-tune with 70b LLM with AnswerDotAI/fsdp_qlora on 2x 24G GPUs, I'm just thinking whether torchtune can support it directly, and maybe distributed inference further.
@ebsmothers @rohan-varma thanks for your guys reply, good to hear that you have this plan.
Closing this as completed as #909 was merged and is included in v0.2.0