torchtune Is there a plan to support AnswerDotAI/fsdp_qlora that fine-tune 70b LLM on 2x 24G GPUs (like: RTX 3090)?

Is there a plan to support AnswerDotAI/fsdp_qlora that fine-tune 70b LLM on 2x 24G GPUs (like: RTX 3090)?

Open yaohwang opened this issue 1 year ago • 4 comments

trafficstars

https://github.com/AnswerDotAI/fsdp_qlora

Apr 19 '24 04:04 yaohwang

Hi, I believe this recipe has similar functionality to what you're asking for: https://github.com/pytorch/torchtune/blob/main/recipes/lora_finetune_distributed.py

Apr 19 '24 07:04 calmitchell617

You will just need to figure out how to pass in the proper config, for which there is a good amount of documentation.

Apr 19 '24 07:04 calmitchell617

Thanks for creating the issue! Currently our QLoRA configs are only meant for single-device, but extending them to support multiple devices with FSDP is high on our list of priorities. cc @rohan-varma who can give more details on this.

Apr 19 '24 14:04 ebsmothers

Thanks for the comments! As @ebsmothers mentioned, we only support QLoRA on a single GPU at the moment. Our current offering of FSDP and QLoRA don't compose with each other due to some technical limitations at the moment.

Over the medium/long-term, we'll be working directly with the PyTorch core team (folks like @awgu) to help build and test out FSDP v2 which should have much better composability with techniques like QLoRA and quantization APIs in general. Stay tuned for concrete plans very soon!

Apr 19 '24 16:04 rohan-varma

Update: @weifengpy has been working on a PR #909 to do this!

May 31 '24 22:05 janeyx99

@calmitchell617 I've done fine-tune with 70b LLM with AnswerDotAI/fsdp_qlora on 2x 24G GPUs, I'm just thinking whether torchtune can support it directly, and maybe distributed inference further.

@ebsmothers @rohan-varma thanks for your guys reply, good to hear that you have this plan.

Jun 01 '24 08:06 yaohwang

Closing this as completed as #909 was merged and is included in v0.2.0

Jul 19 '24 04:07 RdoubleA

torchtune torchtune copied to clipboard

Is there a plan to support AnswerDotAI/fsdp_qlora that fine-tune 70b LLM on 2x 24G GPUs (like: RTX 3090)?

torchtune
torchtune copied to clipboard