lmdeploy
lmdeploy copied to clipboard
[Feature] 海光DCU简单测试,希望能支持
Motivation
我希望在海光的DCU上能拉起服务
lmdeploy serve api_server /data/models/qwen/Qwen2-7B-Instruct \
> --model-name Qwen2-7B-Instruct \
> --server-port 8000 \
> --tp 2
2024-08-27 16:11:39,097 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.257916 14849 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=93939288346384
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.380654 14979 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=94797938383200
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.27it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.89it/s]
I0827 16:11:55.645321 14849 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
INFO: Started server process [14849]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
但当我访问服务时:
curl "http://localhost:8000/v1/chat/completions" \
> -H "Content-Type: application/json" \
> -H "Authorization: Bearer replace_api_key" \
> -d '{
> "messages": [
> {
> "role": "system",
> "content": "You are a helpful assistant."
> },
> {
> "role": "user",
> "content": "你好,你是谁?你会做什么?"
> }
> ]
> }'
服务端coredump了
Unsupported conversion from bf16 to f16
UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!
Unsupported conversion from bf16 to f16
UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!
Aborted (core dumped)
root@sealos:/lmdeploy#
root@sealos:/lmdeploy# /usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Related resources
No response
Additional context
No response
不在未来半年的计划之内 不知道你是否乐意基于 LMDeploy 的pytorch engine支持海光DCU呢? 如果有意愿贡献的话,我们荣幸之至
Motivation
我希望在海光的DCU上能拉起服务
lmdeploy serve api_server /data/models/qwen/Qwen2-7B-Instruct \ > --model-name Qwen2-7B-Instruct \ > --server-port 8000 \ > --tp 2 2024-08-27 16:11:39,097 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again. WARNING: Logging before InitGoogleLogging() is written to STDERR I0827 16:11:47.257916 14849 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=93939288346384 WARNING: Logging before InitGoogleLogging() is written to STDERR I0827 16:11:47.380654 14979 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=94797938383200 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.27it/s] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.89it/s] I0827 16:11:55.645321 14849 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!! INFO: Started server process [14849] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)但当我访问服务时:
curl "http://localhost:8000/v1/chat/completions" \ > -H "Content-Type: application/json" \ > -H "Authorization: Bearer replace_api_key" \ > -d '{ > "messages": [ > { > "role": "system", > "content": "You are a helpful assistant." > }, > { > "role": "user", > "content": "你好,你是谁?你会做什么?" > } > ] > }'服务端coredump了
Unsupported conversion from bf16 to f16 UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798! Unsupported conversion from bf16 to f16 UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798! Aborted (core dumped) root@sealos:/lmdeploy# root@sealos:/lmdeploy# /usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d 'Related resources
No response
Additional context
No response
我上次用dcu dtk 24.01的vllm。他们必须指定fp16,bf16有些问题