Kingsley comments

Results 80 comments of


                                            Kingsley

Qwen3 MoE模型训练GPU使用率很低

目前用transformers modeling的方式后训练moe模型都会慢很多，https://github.com/QwenLM/Qwen3/issues/736#issuecomment-2207996348

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

因为vllm的官方代码里支持的是internvl-chat版本，所以识别不到，我有空加一个internvl-hf -> internvl-chat的转换，这里有和你一样的问题https://github.com/hiyouga/LLaMA-Factory/pull/7258#issuecomment-2858717733

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

Now we offer a simple script for guys who want to use VLLM serve `InternVL-series` after training `-HF` version model. > [!Warning] > 1. The following pipeline is only verified...

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

> > 我用 llamafactory sft 的 internvl3-hf 模型，vllm server 启动时报的另一个错：AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' > > 部分上下文如下： > > ``` > > ... > > File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line...

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

> [@Kuangdd01](https://github.com/Kuangdd01) > > Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community....

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

Now we do not support using this script to convert the lora adapter, we should merge LoRA adapter to the HF model then convert the whole checkpoint. Indeed, we need...

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

``` mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/ ``` You should move these safetensors to a local dir that contains the configs that come from https://huggingface.co/OpenGVLab/InternVL3-8B. Then vllm python file should be ```python import...

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

> [@Kuangdd01](https://github.com/Kuangdd01) > > ![Image](https://github.com/user-attachments/assets/de4b58f6-999d-44b9-958d-79ecd41f9caa) > > 能详细说明一下怎么替换这5个json文件吗，我训练加了额外的tokens，当我把全量微调后的checkpoint里的json替换原始chat里的json，然后使用官方vllm推理时会效果变差，用llamafactory的huggingface框架 API推理效果是正常的。把下面四个换一下就行了吧

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

> [@Kuangdd01](https://github.com/Kuangdd01) 我试了一下用hugging face里提供的模版推理保存的checkpoint： from transformers import AutoProcessor, AutoModelForImageTextToText import torch > > torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16) > > messages...

使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.

> > 您好我想用Intern3-VL-8B进行推理但是报错无法识别mmlm模型然后我照着您的回答[#8086 (comment)](https://github.com/hiyouga/LLaMA-Factory/issues/8086#issuecomment-2898640569) 进行修改时遇到了下面的问题感谢您的关注和回复！我把[https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。](https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main%E3%80%82) 上的代码和权重下载到了InternVL3-8B文件夹然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main()...