fsdp_qlora issues

Support torch model bin format

This PR enables to load models which contain pytorch model bin format files only (not safetensors format).

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Hey, I'm loving the goal of lowering the resource requirements for training! In this paper https://arxiv.org/abs/2403.06504 they claim direct memory access between the GPUNvme Storage is more efficient at swapping...

Iron-Bound

train.py

1

I had to vary this code here in the Train.py to get it to work on my system # LoRA and DORA modules sys.path.append("./scripts") from scripts.lora import LORA from scripts.dora...

mylesgoose

fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase)

Fix `RuntimeError` in https://github.com/AnswerDotAI/fsdp_qlora/issues/28 once and for all by protecting main code with `if __name__ != '__main__': return` **Background:** * The original issue occurred when the `awq` package was imported...

geronimi73

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference

4

Hello, I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters...

iseesaw

ValueError report

Hi, I met the following error when finetune llama7b model with FSDP+HQQ: ``` Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/workspace/fsdp_qlora/train.py", line 723,...

mxjmtxrm

rationalism

How does one load and do inference on fine-tuned LLama 3 using bnb_dora train script?

I used this script to fine tune LLama 3 (from AnswerAI blog post), what I'm left with is a state dict that I am unable to use to replace layers...

pe-hy

fsdp_qlora
fsdp_qlora copied to clipboard

Metadata

Support torch model bin format

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

train.py

fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase)

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference

ValueError report

Question about GPU memory usage.

DeepSeek VL support

train.py script crashes when using HQQ

How does one load and do inference on fine-tuned LLama 3 using bnb_dora train script?

← Metadata

Owner

Metadata

fsdp_qlora fsdp_qlora copied to clipboard

Metadata

← Metadata

Owner

Metadata

fsdp_qlora
fsdp_qlora copied to clipboard