DeepSeek-Coder inference with tensorrt

Has anyone tried running the deepseek_coder model using tensorrt_llm?

Jan 24 '24 07:01 thanhtung901

We tried to run 1.3b-base on TensorRT LLM with fp16 enabled, but got incorrect completion output.

Jan 25 '24 04:01 chenxu2048

Can you guide me?

Jan 25 '24 04:01 thanhtung901

Install TensorRT-LLM or build it from source.
Clone the TensorRT-LLM project and goto examples/llama.
Follow the instructions in examples/llama/README.md.
Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

Jan 25 '24 06:01 chenxu2048

Install TensorRT-LLM or build it from source.

Clone the TensorRT-LLM project and goto examples/llama.

Follow the instructions in examples/llama/README.md.

Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

Hi @chenxu2048 Have u resolved the problem of deepseek?

Mar 07 '24 12:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

Mar 08 '24 02:03 chenxu2048

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

Mar 08 '24 05:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

Mar 08 '24 07:03 chenxu2048

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16？

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

Mar 08 '24 08:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16？

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

No, we didn't.

Mar 11 '24 03:03 chenxu2048

inference with tensorrt_llm