Kingsley
Kingsley
目前用transformers modeling的方式后训练moe模型都会慢很多,https://github.com/QwenLM/Qwen3/issues/736#issuecomment-2207996348
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
因为vllm的官方代码里支持的是internvl-chat版本,所以识别不到,我有空加一个internvl-hf -> internvl-chat的转换, 这里有和你一样的问题https://github.com/hiyouga/LLaMA-Factory/pull/7258#issuecomment-2858717733
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
Now we offer a simple script for guys who want to use VLLM serve `InternVL-series` after training `-HF` version model. > [!Warning] > 1. The following pipeline is only verified...
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
> > 我用 llamafactory sft 的 internvl3-hf 模型,vllm server 启动时报的另一个错:AttributeError: 'InternVLConfig' object has no attribute 'vocab_size' > > 部分上下文如下: > > ``` > > ... > > File "/usr/local/lib/python3.9/dist-packages/vllm/v1/worker/gpu_worker.py", line...
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
> [@Kuangdd01](https://github.com/Kuangdd01) > > Hi, first of all, thank you so much for providing the fine-tuning code for InternVL3. I really appreciate your work and contribution to the open-source community....
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
Now we do not support using this script to convert the lora adapter, we should merge LoRA adapter to the HF model then convert the whole checkpoint. Indeed, we need...
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
``` mv /data/onout/martin/MODELS/internvl3_8b/lora/merged_1596_chat/model.safetensors /data/onout/martin/MODELS/internvl3_8b/lora/sft/checkpoint-1596/ ``` You should move these safetensors to a local dir that contains the configs that come from https://huggingface.co/OpenGVLab/InternVL3-8B. Then vllm python file should be ```python import...
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
> [@Kuangdd01](https://github.com/Kuangdd01) > >  > > 能详细说明一下怎么替换这5个json文件吗,我训练加了额外的tokens,当我把全量微调后的checkpoint里的json替换原始chat里的json,然后使用官方vllm推理时会效果变差,用llamafactory的huggingface框架 API推理效果是正常的。 把下面四个换一下就行了吧
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
> [@Kuangdd01](https://github.com/Kuangdd01) 我试了一下用hugging face里提供的模版推理保存的checkpoint: from transformers import AutoProcessor, AutoModelForImageTextToText import torch > > torch_device = "cuda" model_checkpoint = "OpenGVLab/InternVL3-1B-hf" processor = AutoProcessor.from_pretrained(model_checkpoint) model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16) > > messages...
使用vllm推理InternVL3-8B-hf时返回ValueError: `limit_mm_per_prompt` is only supported for multimodal models.
> > 您好 我想用Intern3-VL-8B进行推理 但是报错无法识别mmlm模型 然后我照着您的回答[#8086 (comment)](https://github.com/hiyouga/LLaMA-Factory/issues/8086#issuecomment-2898640569) 进行修改时遇到了下面的问题 感谢您的关注和回复! 我把[https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main。](https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main%E3%80%82) 上的代码和权重下载到了InternVL3-8B文件夹 然后运行指令 python scripts/convert_ckpt/intern3-vl-8b.py --input_dir InternVL3-8B --output_dir saves/internvl3-8b-chat 为什么报错 Traceback (most recent call last): File "code/LLaMA-Factory/scripts/convert_ckpt/intern3-vl-8b.py", line 468, in main()...