machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

[WIP] [GenAI] Lora Finetune

Open LittleLittleCloud opened this issue 1 year ago • 1 comments

Lora fine-tuning is an adapter-based technique to fine-tune an LLM. It changes LLM model architecture by adding learnable lora layers to transformers. During fine-tuning, only lora weights are adjustable and the LLM weights are frozen, so it requires much less GPU memory comparing to a full-layer fine-tuning. Based on this table, it requires 16GB memory to fine-tuning a 7B size model in 16bits, which can be fit in rtx 3090, 4080 and 4090. A wider range of GPUs can be fit on 3.8B LLMs like phi-3.5-mini

API design (wip)

Package: Microsoft.ML.GenAI.Lora

interface ICausalLMLoraPipeline {} // pipeline for loading causal LM + lora layers

class LoraConfiguration // lora configuration

LittleLittleCloud avatar Nov 06 '24 17:11 LittleLittleCloud

@LittleLittleCloud could you add this to a Milestone for when you are planning on getting this done by?

michaelgsharp avatar Mar 10 '25 07:03 michaelgsharp