DeepSeek-Coder icon indicating copy to clipboard operation
DeepSeek-Coder copied to clipboard

inference with tensorrt_llm

Open thanhtung901 opened this issue 1 year ago • 10 comments

Has anyone tried running the deepseek_coder model using tensorrt_llm?

thanhtung901 avatar Jan 24 '24 07:01 thanhtung901

We tried to run 1.3b-base on TensorRT LLM with fp16 enabled, but got incorrect completion output.

chenxu2048 avatar Jan 25 '24 04:01 chenxu2048

Can you guide me?

thanhtung901 avatar Jan 25 '24 04:01 thanhtung901

  1. Install TensorRT-LLM or build it from source.
  2. Clone the TensorRT-LLM project and goto examples/llama.
  3. Follow the instructions in examples/llama/README.md.
  4. Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

chenxu2048 avatar Jan 25 '24 06:01 chenxu2048

  1. Install TensorRT-LLM or build it from source.
  2. Clone the TensorRT-LLM project and goto examples/llama.
  3. Follow the instructions in examples/llama/README.md.
  4. Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

Hi @chenxu2048 Have u resolved the problem of deepseek?

activezhao avatar Mar 07 '24 12:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

chenxu2048 avatar Mar 08 '24 02:03 chenxu2048

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

activezhao avatar Mar 08 '24 05:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

chenxu2048 avatar Mar 08 '24 07:03 chenxu2048

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16?

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64 

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

activezhao avatar Mar 08 '24 08:03 activezhao

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16?

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64 

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

No, we didn't.

chenxu2048 avatar Mar 11 '24 03:03 chenxu2048