LLaMA-Factory 我在微调example里面的qwen2_5vl_lora_sft.yaml示例，5090+64g内存报错oom

Reminder

[x] I have read the above rules and searched the existing issues.

System Info

models

model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct image_max_pixels: 262144 video_max_pixels: 16384 trust_remote_code: true

method

stage: sft do_train: true finetuning_type: lora lora_rank: 8 lora_target: all

dataset

dataset: mllm_demo,identity,alpaca_en_demo # video: mllm_video_demo template: qwen2_vl cutoff_len: 2048 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 dataloader_num_workers: 4

output

output_dir: saves/qwen2_5vl-7b/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true save_only_model: false report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 resume_from_checkpoint: null --------------------------------------------以上就是一些参数--------------------------------------------

接下来我在终端运行======》llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml

报错如下：

torch.AcceleratorError: CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

整了一天，真的是没招了，求求大佬解决一下吧，谁能解决微信转30（本人是学生虽然不多，但也是我的一天的饭钱了，求求各位帮帮忙吧，呜呜）我的显卡是5090+64g，这个爆显存合理吗？？？

VX：lcy2996078507

Reproduction

Put your message here.

Others

No response

Oct 22 '25 11:10 CSolaris

我的任务管理器就显示占了9g，为什么报错呢

Oct 22 '25 11:10 CSolaris

别用示例，去webui 界面跑

Oct 22 '25 12:10 shidhi771

你用deepspeed了吗，貌似在你提供的脚本里没有看到，加上deepspeed试试

Oct 24 '25 08:10 deadlykitten4

@deadlykitten4 @shidhi771 谢谢您俩，我这是第一次在github上提出问题，没有想到真的有人会关注，我这个问题，还给我提一些建议，真的万分感谢，这个问题解决了，能不能加一个微信，或者您把微信号发我，我加您也中

Oct 28 '25 03:10 CSolaris

@deadlykitten4 @shidhi771 谢谢您俩，我这是第一次在github上提出问题，没有想到真的有人会关注，我这个问题，还给我提一些建议，真的万分感谢，这个问题解决了，能不能加一个微信，或者您把微信号发我，我加您也中

监督微调一只用webui 没有什么，大多数都是数据集和环境问题，小数据集跑通了在正式跑。

Oct 28 '25 03:10 shidhi771

哥，我现在导师让我做一个图像到图像的微调，这个您了解过吗，我问了问这个llama factory不能够进行img2img的，这个如果自己弄的话要注意些什么呢

在 2025-10-28 11:40:25，"shidhi771" @.***> 写道：

shidhi771 left a comment (hiyouga/LLaMA-Factory#9331)

@@.*** 谢谢您俩，我这是第一次在github上提出问题，没有想到真的有人会关注，我这个问题，还给我提一些建议，真的万分感谢，这个问题解决了，能不能加一个微信，或者您把微信号发我，我加您也中

监督微调一只用webui 没有什么，大多数都是数据集和环境问题，小数据集跑通了在正式跑。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Oct 28 '25 03:10 CSolaris

说说你需求，或者提供几条数据集示例和模型名称，需要概述

Oct 28 '25 03:10 shidhi771

哥能不能加个微信，这里发不了图片，或者说我不会在这里发图片，我的VX:lcy2996078507。我是做遥感方向的，以前课题组是做深度学习的，现在老师让我搞这个大模型，然后就是自学。就是类似，输入一张图像然后输出另一张图像，模型肯定不能用qwen2_5vl这种对吧，他这个只能理解图像，不能生成图像，所以我不知道应该选择什么，我目前打算选择的是Qwen/Qwen-Image-Edit，但是貌似这个模型有点大，我的显存不够吧？我也不知道，我刚入门，希望大佬指点指点，可以请大佬吃一顿中午饭

在 2025-10-28 11:46:57，"shidhi771" @.***> 写道：

shidhi771 left a comment (hiyouga/LLaMA-Factory#9331)

说说你需求，或者提供几条数据集示例和模型名称，需要概述

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Oct 28 '25 03:10 CSolaris