opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Feature] 请问使用vllm评测时怎么实现类似HF多卡数据并行?

Open noforit opened this issue 10 months ago • 16 comments

描述该功能

我在评测时的模型type 为vllm,参数如下: image 但是显卡占用只使用了一张卡来评测任务 image 我想让任务划分为几份分别在8张卡上评测,这种功能可以添加吗?还是说可以实现,麻烦解答一下。非常感激! 类似我如果设定为模型type为HF的话,会自动达到这种效果。 image image

是否希望自己实现该功能?

  • [ ] 我希望自己来实现这一功能,并向 OpenCompass 贡献代码!

noforit avatar Mar 26 '24 09:03 noforit

image like above cfg, you can set model_kwargs=dict(tensor_parallel_size=8), for your case.

liushz avatar Mar 26 '24 14:03 liushz

@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire model onto each of the eight GPUs, thereby distributing tasks in parallel across these GPUs. This approach should theoretically yield an eightfold acceleration in evaluation speed.

noforit avatar Mar 26 '24 14:03 noforit

@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire model onto each of the eight GPUs, thereby distributing tasks in parallel across these GPUs. This approach should theoretically yield an eightfold acceleration in evaluation speed.

hi, @liushz , I also want to know how to achieve data parallelism in vLLM when evaluating

andakai avatar Mar 27 '24 14:03 andakai

Please try NumWorkerPartitioner https://github.com/open-compass/opencompass/blob/main/opencompass/partitioners/num_worker.py#L17

tonysy avatar Mar 27 '24 16:03 tonysy

@tonysy Could you possibly offer a quick example? I'm quite unsure how to ues it. Many thanks for your assistance.

noforit avatar Mar 28 '24 14:03 noforit

我感觉这个应该是要看VLLM的文档,:https://docs.vllm.ai/en/latest/serving/distributed_serving.html,我tensor_parallel_size设置的和GPU数量一样是可以的。

IcyFeather233 avatar Apr 01 '24 10:04 IcyFeather233

@IcyFeather233 谢谢你😂,我明白这个tensor_parallel_size可以设定为GPU数2,4,8实现模型分片并行。我这里意思是tensor_parallel_size为1,但是GPU 每张卡都加载一整个模型,然后数据并行,同时评测一个任务的不同数据。最近我实现了该种功能,使用NumWorkerPartitioner。以下为关键参数配置:有需要的可以借鉴。 @darrenglow 。同时感谢 @tonysy 。要是能尽快更新到文档就更好了。 image

noforit avatar Apr 01 '24 10:04 noforit

@noforit 我是这样配的,但还是只有一张卡在跑,能帮我看看原因吗;

infer = dict(
    partitioner=dict(type=NumWorkerPartitioner, num_worker=2),
    runner=dict(
        type=LocalRunner,
        max_num_workers=16,
        task=dict(type=OpenICLInferTask))
)
models = [
    dict(
        type=VLLM,
        abbr='qwen-7b-chat-vllm',
        path="/home/zbl/data/llm/qwen/Qwen-7B-Chat",
        model_kwargs=dict(tensor_parallel_size=1),
        meta_template=_meta_template,
        max_out_len=100,
        max_seq_len=2048,
        batch_size=100,
        generation_kwargs=dict(temperature=0),
        end_str='<|im_end|>',
    )
]

Zbaoli avatar Apr 09 '24 03:04 Zbaoli

@IcyFeather233 我知道你的意思,tensor_parallel_size参数可以设置多卡推理,但我试了下使用多卡推理速度并没有比单卡变快; 所以我想实现的是多个任务并行推理:比如我有n个任务,同时用m个模型,每个模型执行一个任务的推理;

Zbaoli avatar Apr 09 '24 03:04 Zbaoli

@Zbaoli 我看你的参数和我 差了一个 image 加个这个试试?

noforit avatar Apr 09 '24 05:04 noforit

@noforit 谢谢你的回复,但我在models的配置中加了run_cfg=dict(num_gpus=1, num_proces=1)参数之后还是只有一个 gpu 在运行;

Zbaoli avatar Apr 09 '24 05:04 Zbaoli

@Zbaoli 奇怪😂。在程序运行前 加上 CUDA_VISIBLE_DEVICES 呢 image 或者你在/opencompass/opencompass/runners/local.py 里面调试一下?里面会自动检测显卡数量啥的 加个微信?我发你邮件

noforit avatar Apr 09 '24 05:04 noforit

@IcyFeather233 谢谢你😂,我明白这个tensor_parallel_size可以设定为GPU数2,4,8实现模型分片并行。我这里意思是tensor_parallel_size为1,但是GPU 每张卡都加载一整个模型,然后数据并行,同时评测一个任务的不同数据。最近我实现了该种功能,使用NumWorkerPartitioner。以下为关键参数配置:有需要的可以借鉴。 @darrenglow 。同时感谢 @tonysy 。要是能尽快更新到文档就更好了。 image

这里使用了NumWorkerPartitioner后,数据集被拆分成了8份,但最终的summary没法将拆分后的数据集的指标结果汇总在一起,请问您会这样吗?

guoaoo avatar Apr 10 '24 13:04 guoaoo

@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire model onto each of the eight GPUs, thereby distributing tasks in parallel across these GPUs. This approach should theoretically yield an eightfold acceleration in evaluation speed.

我请教下,opencompass提供的Sizepartitioner不就可以对数据集进行切割么?还是说NumWorkerPartitioner的partition方式要更高效一些?

caotianjia avatar Apr 17 '24 11:04 caotianjia

@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire model onto each of the eight GPUs, thereby distributing tasks in parallel across these GPUs. This approach should theoretically yield an eightfold acceleration in evaluation speed.

我请教下,opencompass提供的Sizepartitioner不就可以对数据集进行切割么?还是说NumWorkerPartitioner的partition方式要更高效一些?

size partitioner和numworker partitioner是两种不同的切分方式,一个是按给定的size切分,一个是按照卡的数目切分

bittersweet1999 avatar Apr 28 '24 17:04 bittersweet1999

image 当使用vllm的时候 不知道为什么一直报timeout 上面部分是模型的设置 下面的是错误 请问是怎么回事啊?

disperaller avatar Jun 07 '24 10:06 disperaller