fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT

Open lkm2835 opened this issue 1 year ago • 1 comments

Hi, I'm following the setup guide.

I found a bug and solved it.

https://github.com/triton-inference-server/fastertransformer_backend#setup

docker run -it \
    --shm-size=1g --ulimit memlock=-1 \
    -v ${WORKSPACE}:/workspace \
    --name ft_backend_builder \
    ${TRITON_DOCKER_IMAGE} bash

...

and https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-serving-on-single-node

/workspace/build/fastertransformer_backend/build/bin/gpt_gemm 8 1 32 16 64 4096 50257 1 1 1

->

[FT][INFO] Arguments:
[FT][INFO]   batch_size: 8
[FT][INFO]   beam_width: 1
[FT][INFO]   max_input_len: 32
[FT][INFO]   head_num: 16
[FT][INFO]   size_per_head: 64
[FT][INFO]   inter_size: 4096
[FT][INFO]   vocab_size: 50257
[FT][INFO]   data_type: 1
[FT][INFO]   tensor_para_size: 1
[FT][INFO]   is_append: 1

terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/models/multi_gpu_gpt/gpt_gemm.cc:74

Aborted (core dumped)

This need to fix docker to nvidia-docker.

nvidia-docker run -it \
    --shm-size=1g --ulimit memlock=-1 \
    -v ${WORKSPACE}:/workspace \
    --name ft_backend_builder \
    ${TRITON_DOCKER_IMAGE} bash

reference: https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#prepare

lkm2835 avatar Mar 30 '23 06:03 lkm2835

Thank you for the feedback. There are many ways to use GPUs in docker, and we assume this is an environment setting for customer side because not everyone installs the nvidia-docker.

byshiue avatar Mar 30 '23 07:03 byshiue