fsdp_qlora issues

The README mentions: ``` The SFTTrainer version has to run with a lower batch size (4 vs 8) so we only do 2 gradient accumulation steps vs 4 in the...

RonanKMcGovern

/opt/conda/conda-bld/pytorch_1708025847130/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.

when I tried to train some 'qna' style dataset like knowrohit07/know_sql get this error.

yaohwang

Bigger context size?

Is training with 1024 or 2048 sequence length feasible using this method?

LoganALJones

Thanks for such wonderful work! I see you comment out this line: https://github.com/AnswerDotAI/fsdp_qlora/blob/d7818ec86d17f37db4beef36f80870cbcac37957/train.py#L722 May I ask what is the rationale behind it? Is fsdp_qlora compatible with torch compile?

jzhang38

DoRA training not taking dropout or alpha into account

I think there is a bug in the DoRA implementation as it takes neither `lora_dropout` nor `lora_alpha` into account. These arguments are passed as `*args` to the `__init__` call of...

BenjaminBossan

Add option for local 'custom.jsonl' dataset with llama3 prompt format

Add option for local 'custom.jsonl' dataset with llama3 prompt format Add conversion script for merging fsdp model_state_dict with model

chrismrutherford

Fix: RuntimeError, Error(s) in loading state_dict for PeftModelForCau…

Hi, I'm fixed the bug in `Converting the State Dict.ipynb`

chwenjun225

How to fine-tune a Vision Language Model (VLM)?

Hi there, Just wondering, does this repo support fine-tuning a Vision Language Model (VLM), e.g https://huggingface.co/microsoft/Phi-3.5-vision-instruct? Many thanks for any help, and for this amazing lib!

asmith26

fsdp_qlora
fsdp_qlora copied to clipboard

Metadata

Why is o_proj not targetted?

Q on comparison with SFTTrainer

/opt/conda/conda-bld/pytorch_1708025847130/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.

Bigger context size?

Torch Compile?

DoRA training not taking dropout or alpha into account

Add option for local 'custom.jsonl' dataset with llama3 prompt format

Fix: RuntimeError, Error(s) in loading state_dict for PeftModelForCau…

How to fine-tune a Vision Language Model (VLM)?

← Metadata

Owner

Metadata

fsdp_qlora fsdp_qlora copied to clipboard

Metadata

← Metadata

Owner

Metadata

fsdp_qlora
fsdp_qlora copied to clipboard