lmdeploy [Feature] 海光DCU简单测试，希望能支持

Motivation

我希望在海光的DCU上能拉起服务

 lmdeploy serve api_server  /data/models/qwen/Qwen2-7B-Instruct \
>   --model-name Qwen2-7B-Instruct \
>   --server-port 8000 \
>   --tp 2
2024-08-27 16:11:39,097 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.257916 14849 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=93939288346384
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.380654 14979 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=94797938383200
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  6.27it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.89it/s]
I0827 16:11:55.645321 14849 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
INFO:     Started server process [14849]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

但当我访问服务时：

curl "http://localhost:8000/v1/chat/completions" \
>     -H "Content-Type: application/json" \
>     -H "Authorization: Bearer replace_api_key" \
>     -d '{
>         "messages": [
>             {
>                 "role": "system",
>                 "content": "You are a helpful assistant."
>             },
>             {
>                 "role": "user",
>                 "content": "你好，你是谁？你会做什么？"
>             }
>         ]
>     }'

服务端coredump了

Unsupported conversion from bf16 to f16

UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!
Unsupported conversion from bf16 to f16

UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!

Aborted (core dumped)
root@sealos:/lmdeploy#
root@sealos:/lmdeploy# /usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Related resources

No response

Additional context

No response

Aug 27 '24 08:08 luckfu

不在未来半年的计划之内不知道你是否乐意基于 LMDeploy 的pytorch engine支持海光DCU呢？如果有意愿贡献的话，我们荣幸之至

Sep 04 '24 03:09 lvhan028

Motivation

我希望在海光的DCU上能拉起服务

 lmdeploy serve api_server  /data/models/qwen/Qwen2-7B-Instruct \
>   --model-name Qwen2-7B-Instruct \
>   --server-port 8000 \
>   --tp 2
2024-08-27 16:11:39,097 - lmdeploy - WARNING - Fallback to pytorch engine because turbomind engine is not installed correctly. If you insist to use turbomind engine, you may need to reinstall lmdeploy from pypi or build from source and try again.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.257916 14849 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=93939288346384
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0827 16:11:47.380654 14979 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 3075840000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=94797938383200
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  6.27it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.89it/s]
I0827 16:11:55.645321 14849 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
INFO:     Started server process [14849]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

但当我访问服务时：

curl "http://localhost:8000/v1/chat/completions" \
>     -H "Content-Type: application/json" \
>     -H "Authorization: Bearer replace_api_key" \
>     -d '{
>         "messages": [
>             {
>                 "role": "system",
>                 "content": "You are a helpful assistant."
>             },
>             {
>                 "role": "user",
>                 "content": "你好，你是谁？你会做什么？"
>             }
>         ]
>     }'

服务端coredump了

Unsupported conversion from bf16 to f16

UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!
Unsupported conversion from bf16 to f16

UNREACHABLE executed at ../../../lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp:1798!

Aborted (core dumped)
root@sealos:/lmdeploy#
root@sealos:/lmdeploy# /usr/local/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Related resources

No response

Additional context

No response

我上次用dcu dtk 24.01的vllm。他们必须指定fp16，bf16有些问题

Sep 06 '24 06:09 sunmac