Xiangyu Zhao

Results 44 comments of Xiangyu Zhao

您好,在我们的实践中,对于GPT4o而言,一般我们使用MMMU的[默认prompt设置](https://github.com/open-compass/VLMEvalKit/blob/b1d59b7439030f2a3c60aca07c7d57782ac1f5b7/vlmeval/dataset/image_mcq.py#L321)。这种情况下,即便不刻意添加CoT prompt,GPT4o也会生成带有CoT格式的回答。

Hi, We now support [LMDeploy](https://github.com/InternLM/lmdeploy), a powerful tool designed to accelerate the inference of LLMs and MLLMs, similar to **VLLM** . If you’re planning to use VLLM instead, the execution...

Please use `pip install git+https://github.com/huggingface/transformers accelerate` to build transformers from source. We use latest transformers and `torch=2.5.1`.

Hi, Thank you for your interest in VLMEvalKit! Could you please share the execution code you used? This will help us better understand the context and provide more accurate guidance....

Please use this repo: https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf

Thank you for your interest in our project. If you wish to contribute benchmarks for bias and safety, we welcome such additions. Please feel free to submit a pull request...

您好, DDP(Distributed Data Parallel)策略通常用于模型训练时的并行化处理。对于 Internvl2.5-78B 模型,如果您希望进行并行推理,可以尝试使用`torchrun --nproc_per_node=2`来部署。 通过该命令,您可以同时部署两个模型。按照您之前提到的切分策略,每个模型实例会被分配到三张 GPU 卡上运行。因此,在这种情况下,总共会占用六张卡的资源。当然,具体的部署策略可以根据您机器的实际硬件资源上限进行灵活调整。

对于Qwen2.5-72B模型而言,具体的[split_model](https://github.com/open-compass/VLMEvalKit/blob/eb58e9e05d4f64316608128988c60a3cb200307d/vlmeval/vlm/qwen2_vl/model.py#L34)定义了模型的切分方法。对于72B模型,只用两张卡切分,很容易导致OOM问题。一般建议至少四张卡进行推理。

您的报错仍旧是OOM吗?那建议您使用`--nproc_per_node=1`试试

如果你想更改模型路径,请在[config.py](https://github.com/open-compass/VLMEvalKit/blob/eda4a6296c008cce586bfa0f934017b5269bc35b/vlmeval/config.py#L406)中将`model_path`更改为你的路径,如`/opt/Qwen/Qwen2-VL-7B-Instruct`。