Qwen2.5 关于模型训练时间和显存资源

如题请问为什么qwen1.5 无论是训练时间还是显存资源比起qwen1 更消耗资源呢，我知道模型结构上发生了一些变化显卡:A800 model: qwen1.5 -14B-chat 单卡lora 62G 显存训练时间: qwen1-14B-chat的三倍

虚心请教

Mar 20 '24 05:03 GXKIM

You mean 1.5 comsumes more memory than 1? I did test the inference costs, see the doc here https://qwen.readthedocs.io/en/latest/benchmark/hf_infer.html . Maybe I should add a training costs to you guys.

Mar 22 '24 05:03 JustinLin610

You mean 1.5 comsumes more memory than 1? I did test the inference costs, see the doc here https://qwen.readthedocs.io/en/latest/benchmark/hf_infer.html . Maybe I should add a training costs to you guys.

Yes, absolutely. I don't understand why the Qwen-1.5-14B chat model needs 62GB of RAM to train the LORA

Mar 22 '24 05:03 GXKIM

Context length might be a matter. Are you using the official script or Llama factory or Axolotl?

Mar 22 '24 05:03 JustinLin610

Context length might be a matter. Are you using the official script or Llama factory or Axolotl?

The Llama-factory framework uses normal VRAM for LoRA, but when using the official Qwen script, the VRAM usage is very high.

python: examples/sft/finetune.py

Mar 22 '24 05:03 GXKIM

Context length might be a matter. Are you using the official script or Llama factory or Axolotl?

The Llama-factory framework uses normal VRAM for LoRA, but when using the official Qwen script, the VRAM usage is very high.

python: examples/sft/finetune.py

I see. let me check this out and provide the statistics through docs

Mar 22 '24 05:03 JustinLin610

Context length might be a matter. Are you using the official script or Llama factory or Axolotl?上下文长度可能是一个问题。你使用的是官方脚本还是骆驼工厂或蝾螈？

The Llama-factory framework uses normal VRAM for LoRA, but when using the official Qwen script, the VRAM usage is very high.Llama 工厂框架对 LoRA 使用普通的 VRAM，但在使用官方 Qwen 脚本时，VRAM使用率非常高。 python: examples/sft/finetune.pypython：examples/sft/finetune.py

I see. let me check this out and provide the statistics through docs明白了。让我检查一下并通过文档提供统计数据 ram

training model dataset: 5000

Mar 22 '24 05:03 GXKIM

Qwen2.5 Qwen2.5 copied to clipboard

关于模型训练时间和显存资源

Qwen2.5
Qwen2.5 copied to clipboard