swift icon indicating copy to clipboard operation
swift copied to clipboard

qwen1.5-4B-chat 多卡推理报错

Open zhangfan-algo opened this issue 3 months ago • 13 comments

image

使用的是8卡A800

运行脚本 RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0 python examples/pytorch/llm/llm_infer.py
--infer_backend vllm
--ckpt_dir /mnt/pfs/zhangfan/study_info/LLaMA-Factory_0308/output/merge_sft_prompt_0319_qwen1half_4B_sft_0319/checkpoint-5890
--custom_val_dataset_path data/merge_sft_prompt_0319_test.jsonl
--max_length -1
--use_flash_attn true
--max_new_tokens 2300
--temperature 0.01
--top_p 0.99
--repetition_penalty 1.
--use_flash_attn true
--verbose false
--do_sample true
--val_dataset_sample -1
--tensor_parallel_size 8

zhangfan-algo avatar Mar 20 '24 02:03 zhangfan-algo

Traceback (most recent call last): File "/mnt/pfs/zhangfan/study_info/swift_0311/examples/pytorch/llm/llm_infer.py", line 7, in result = infer_main() File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, **kwargs) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/infer.py", line 229, in llm_infer llm_engine, template = prepare_vllm_engine_template(args) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/utils/vllm_utils.py", line 341, in prepare_vllm_engine_template llm_engine = get_vllm_engine( File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/utils/vllm_utils.py", line 79, in get_vllm_engine llm_engine = llm_engine_cls.from_engine_args(engine_args) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args engine = cls(*engine_configs, File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 115, in init self._verify_args() File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 314, in _verify_args self.model_config.verify_with_parallel_config(self.parallel_config) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/vllm/config.py", line 211, in verify_with_parallel_config raise ValueError( ValueError: Total number of attention heads (20) must be divisible by tensor parallel size (8).

zhangfan-algo avatar Mar 20 '24 02:03 zhangfan-algo

这个模型只能用4卡

Jintao-Huang avatar Mar 20 '24 05:03 Jintao-Huang

--tensor_parallel_size 4

Jintao-Huang avatar Mar 20 '24 05:03 Jintao-Huang

4卡有的请求还会报错 Exception in thread Thread-8 (_handle_results): Traceback (most recent call last): File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 761, in run_closure _threading_Thread_run(self) File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/multiprocessing/pool.py", line 579, in _handle_results task = get() File "/mnt/pfs/zhangfan/system/anaconda/envs/llama-factory/lib/python3.10/multiprocessing/connection.py", line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) TypeError: APIStatusError.init() missing 2 required keyword-only arguments: 'response' and 'body'

zhangfan-algo avatar Mar 20 '24 07:03 zhangfan-algo

llama-factory

Jintao-Huang avatar Mar 20 '24 08:03 Jintao-Huang

CUDA_VISIBLE_DEVICES=0,1,2,3

Jintao-Huang avatar Mar 20 '24 08:03 Jintao-Huang

CUDA_VISIBLE_DEVICES=0,1,2,3

是RAY_memory_monitor_refresh_ms=0 这个不可以设置吗

zhangfan-algo avatar Mar 20 '24 09:03 zhangfan-algo

Traceback (most recent call last): File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/starlette/routing.py", line 74, in app response = await func(request) File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/mnt/pfs/zhangfan/system/anaconda/envs/swift/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/deploy.py", line 453, in create_chat_completion return await inference_vllm_async(request, raw_request) File "/mnt/pfs/zhangfan/study_info/swift_0311/swift/llm/deploy.py", line 107, in inference_vllm_async input_ids = template.encode(example)[0]['input_ids'] KeyError: 'input_ids'

还是有报错

zhangfan-algo avatar Mar 20 '24 09:03 zhangfan-algo

你拉一下最新的代码 或者指定 --max_length -1

Jintao-Huang avatar Mar 20 '24 11:03 Jintao-Huang

如果指定-1会让生成回复时间特别久

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Jintao @.> 发送时间: Wednesday, March 20, 2024 7:31:51 PM 收件人: modelscope/swift @.> 抄送: zhangfan-algo @.>; Author @.> 主题: Re: [modelscope/swift] qwen1.5-4B-chat 多卡推理报错 (Issue #579)

你拉一下最新的代码 或者指定 --max_length -1

― Reply to this email directly, view it on GitHubhttps://github.com/modelscope/swift/issues/579#issuecomment-2009355417, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMJFNHOD4W6MWTN25MJNPLYZFXSPAVCNFSM6AAAAABE6VZJQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBZGM2TKNBRG4. You are receiving this because you authored the thread.Message ID: @.***>

zhangfan-algo avatar Mar 20 '24 12:03 zhangfan-algo

大佬请教下 微调完模型部署模型推理时间特别久是什么原因呢 有的请求时间达到几千秒

zhangfan-algo avatar Mar 22 '24 09:03 zhangfan-algo

限制一下 max_new_tokens。不过这个请求时间,我不太能理解...

Jintao-Huang avatar Mar 23 '24 16:03 Jintao-Huang

限制在2600,然后max length设置在-1。但是生成时间还是很久。而且有的case生成的长度超过了2600

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Jintao @.> 发送时间: Sunday, March 24, 2024 12:57:05 AM 收件人: modelscope/swift @.> 抄送: zhangfan-algo @.>; Author @.> 主题: Re: [modelscope/swift] qwen1.5-4B-chat 多卡推理报错 (Issue #579)

限制一下 max_new_tokens。不过这个请求时间,我不太能理解...

― Reply to this email directly, view it on GitHubhttps://github.com/modelscope/swift/issues/579#issuecomment-2016546699, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMJFNGYECPZ6BZQVDKSPXTYZWX6DAVCNFSM6AAAAABE6VZJQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGU2DMNRZHE. You are receiving this because you authored the thread.Message ID: @.***>

zhangfan-algo avatar Mar 24 '24 03:03 zhangfan-algo