使用qlora微调的时候，由于微调的数据过长显存溢出，能设置多卡qlora微调吗

Open Zhang-star-master opened this issue 1 year ago • 4 comments

使用qlora微调的时候，由于微调的数据过长显存溢出，能设置多卡qlora微调吗

Sep 19 '23 08:09 Zhang-star-master

把 Firefly/train_qlora.py 中的 model = AutoModelForCausalLM.from_pretrained() 改为以下内容：

# 加载模型
    model = AutoModelForCausalLM.from_pretrained(
        args.model_name_or_path,
        device_map="auto",
        load_in_4bit=True,
        torch_dtype=torch.float16,
        trust_remote_code=True,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            llm_int8_threshold=6.0,
            llm_int8_has_fp16_weight=False,
        ),
    )

然后，将启动命令由 torchrun --nproc_per_node={nums} train_qlora.py --train_args_file xxx.json 改为 python train_qlora.py --train_args_file xxx.json 即可实现单机多卡均摊下的微调，参与微调的卡可以用 CUDA_VISIBLE_DEVICES=0,1,3 python train_qlora.py --train_args_file xxx.json 这样的命令指定，不指定则默认全部卡参与微调

Sep 20 '23 09:09 bswaterb

@yangjianxin1 建议添加单机多卡下出于均摊显存目的的微调方法

Sep 20 '23 09:09 bswaterb

加载模型

model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    device_map="auto",
    load_in_4bit=True,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
    ),
)

你好，你这种方式是可以实现模型并行微调吗（就是把显存均分到两张卡上

Nov 06 '23 06:11 shudct

这个是数据并行吧

May 11 '24 12:05 Kenneth0901

Firefly Firefly copied to clipboard

使用qlora微调的时候，由于微调的数据过长显存溢出，能设置多卡qlora微调吗

加载模型

Firefly
Firefly copied to clipboard