swift qwen1.5-4B-chat 多卡推理报错

qwen1.5-4B-chat 多卡推理报错

Open zhangfan-algo opened this issue 3 months ago • 13 comments

使用的是8卡A800

运行脚本 RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0 python examples/pytorch/llm/llm_infer.py
--infer_backend vllm
--ckpt_dir /mnt/pfs/zhangfan/study_info/LLaMA-Factory_0308/output/merge_sft_prompt_0319_qwen1half_4B_sft_0319/checkpoint-5890
--custom_val_dataset_path data/merge_sft_prompt_0319_test.jsonl
--max_length -1
--use_flash_attn true
--max_new_tokens 2300
--temperature 0.01
--top_p 0.99
--repetition_penalty 1.
--use_flash_attn true
--verbose false
--do_sample true
--val_dataset_sample -1
--tensor_parallel_size 8

Mar 20 '24 02:03 zhangfan-algo

Traceback (most recent call last): File "/mnt/pfs/zhangfan/study_info/swift_0311/examples/pytorch/llm/llm_infer.py", line 7, in result = infer_main() File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, **kwargs) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/infer.py", line 229, in llm_infer llm_engine, template = prepare_vllm_engine_template(args) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/utils/vllm_utils.py", line 341, in prepare_vllm_engine_template llm_engine = get_vllm_engine( File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/utils/vllm_utils.py", line 79, in get_vllm_engine llm_engine = llm_engine_cls.from_engine_args(engine_args) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args engine = cls(*engine_configs, File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 115, in init self._verify_args() File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 314, in _verify_args self.model_config.verify_with_parallel_config(self.parallel_config) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/config.py", line 211, in verify_with_parallel_config raise ValueError( ValueError: Total number of attention heads (20) must be divisible by tensor parallel size (8).

Mar 20 '24 02:03 zhangfan-algo

这个模型只能用4卡

Mar 20 '24 05:03 Jintao-Huang

--tensor_parallel_size 4

Mar 20 '24 05:03 Jintao-Huang

4卡有的请求还会报错 Exception in thread Thread-8 (_handle_results): Traceback (most recent call last): File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 761, in run_closure _threading_Thread_run(self) File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/multiprocessing/pool.py", line 579, in _handle_results task = get() File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: APIStatusError.init() missing 2 required keyword-only arguments: 'response' and 'body'

Mar 20 '24 07:03 zhangfan-algo

llama-factory

Mar 20 '24 08:03 Jintao-Huang

CUDA_VISIBLE_DEVICES=0,1,2,3

Mar 20 '24 08:03 Jintao-Huang

CUDA_VISIBLE_DEVICES=0,1,2,3

是RAY_memory_monitor_refresh_ms=0 这个不可以设置吗

Mar 20 '24 09:03 zhangfan-algo

Traceback (most recent call last): File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await func(request) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/deploy.py", line 453, in create_chat_completion return await inference_vllm_async(request, raw_request) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/deploy.py", line 107, in inference_vllm_async input_ids = template.encode(example)[0]['input_ids'] KeyError: 'input_ids'

还是有报错

Mar 20 '24 09:03 zhangfan-algo

你拉一下最新的代码或者指定 --max_length -1

Mar 20 '24 11:03 Jintao-Huang

如果指定-1会让生成回复时间特别久

获取 Outlook for iOShttps://aka.ms/o0ukef

发件人: Jintao @.> 发送时间: Wednesday, March 20, 2024 7:31:51 PM 收件人: modelscope/swift @.> 抄送: zhangfan-algo @.>; Author @.> 主题: Re: [modelscope/swift] qwen1.5-4B-chat 多卡推理报错 (Issue #579)

你拉一下最新的代码或者指定 --max_length -1

― Reply to this email directly, view it on GitHubhttps://github.com/modelscope/swift/issues/579#issuecomment-2009355417, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMJFNHOD4W6MWTN25MJNPLYZFXSPAVCNFSM6AAAAABE6VZJQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBZGM2TKNBRG4. You are receiving this because you authored the thread.Message ID: @.***>

Mar 20 '24 12:03 zhangfan-algo

大佬请教下微调完模型部署模型推理时间特别久是什么原因呢有的请求时间达到几千秒

Mar 22 '24 09:03 zhangfan-algo

限制一下 max_new_tokens。不过这个请求时间，我不太能理解...

Mar 23 '24 16:03 Jintao-Huang

限制在2600，然后max length设置在-1。但是生成时间还是很久。而且有的case生成的长度超过了2600

获取 Outlook for iOShttps://aka.ms/o0ukef

发件人: Jintao @.> 发送时间: Sunday, March 24, 2024 12:57:05 AM 收件人: modelscope/swift @.> 抄送: zhangfan-algo @.>; Author @.> 主题: Re: [modelscope/swift] qwen1.5-4B-chat 多卡推理报错 (Issue #579)

限制一下 max_new_tokens。不过这个请求时间，我不太能理解...

― Reply to this email directly, view it on GitHubhttps://github.com/modelscope/swift/issues/579#issuecomment-2016546699, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMJFNGYECPZ6BZQVDKSPXTYZWX6DAVCNFSM6AAAAABE6VZJQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGU2DMNRZHE. You are receiving this because you authored the thread.Message ID: @.***>

Mar 24 '24 03:03 zhangfan-algo

swift swift copied to clipboard

qwen1.5-4B-chat 多卡推理报错

swift
swift copied to clipboard