LLaMA-Factory 4bit longlora 微调爆显存

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 37.33 GiB. GPU 0 has a total capacity of 47.54 GiB of which 34.00 GiB is free. Process 251364 has 13.53 GiB memory in use. Of the allocated memory 8.89 GiB is allocated by PyTorch, and 4.32 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 预览命令如图，A40 48G显存，用qlora 4bit微调100行的数据集都爆显存了，用了longlora，截断长度12288稍微长一点，请问这个情况正常吗，还是说需要进一步截断数据集

Expected behavior

No response

System Info

No response

Others

No response

Mar 09 '24 17:03 chosenduke

可能是 gemma 词表太大了，换 mistral 模型可能会好点

Mar 09 '24 17:03 hiyouga

可能是 gemma 词表太大了，换 mistral 模型可能会好点

我的数据集是纯英文的呀，只占用了词表的一小部分，你是想说embedding层太大了嘛？

Mar 09 '24 17:03 chosenduke

我是取了数据集的一小部分，这个一小部分里的数据已经算比较短的了，如果再截断有点担心训练质量呢

在 2024-03-10 00:33:53，"hoshi-hiyouga" @.***> 写道：

首先你必须开启 flash_attn 然后调整截断长度直到不发生 OOM

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Mar 09 '24 17:03 chosenduke

这里的调整截断长度是指我在webui调整长度还是说我直接手动截断我数据集里的长度，我的数据集的input很长，output比较短，我需要手动截断input长度吗？

在 2024-03-10 00:33:53，"hoshi-hiyouga" @.***> 写道：

首先你必须开启 flash_attn 然后调整截断长度直到不发生 OOM

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Mar 09 '24 17:03 chosenduke

我用LLaMA-Factory微调ChatGLM3-6B的模型，总是报内存溢出的问题,使用16G显存的微调提示差几十兆，换成24G显存还是提示差几十兆，使用chatglm官方提供的方法微调在24g是完全够用的微调用的文件总共500多K： law0.json

使用LLaMA-Factory进行微调 3cf239ba6b64a838218d9627f66b6b9

chatglm官方推荐的微调方法（lora）

Mar 16 '24 06:03 gq2010

我用LLaMA-Factory微调ChatGLM3-6B的模型，总是报内存溢出的问题,使用16G显存的微调提示差几十兆，换成24G显存还是提示差几十兆，使用chatglm官方提供的方法微调在24g是完全够用的微调用的文件总共500多K： law0.json

使用LLaMA-Factory进行微调

chatglm官方推荐的微调方法（lora）

怎么样？解决了吗？我遇到了类似情况

Mar 26 '24 10:03 Yi-Lyu

LLaMA-Factory LLaMA-Factory copied to clipboard

4bit longlora 微调 爆显存

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard

4bit longlora 微调爆显存