fastertransformer_backend
fastertransformer_backend copied to clipboard
CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT
Hi, I'm following the setup guide.
I found a bug and solved it.
https://github.com/triton-inference-server/fastertransformer_backend#setup
docker run -it \
--shm-size=1g --ulimit memlock=-1 \
-v ${WORKSPACE}:/workspace \
--name ft_backend_builder \
${TRITON_DOCKER_IMAGE} bash
...
and https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-serving-on-single-node
/workspace/build/fastertransformer_backend/build/bin/gpt_gemm 8 1 32 16 64 4096 50257 1 1 1
->
[FT][INFO] Arguments:
[FT][INFO] batch_size: 8
[FT][INFO] beam_width: 1
[FT][INFO] max_input_len: 32
[FT][INFO] head_num: 16
[FT][INFO] size_per_head: 64
[FT][INFO] inter_size: 4096
[FT][INFO] vocab_size: 50257
[FT][INFO] data_type: 1
[FT][INFO] tensor_para_size: 1
[FT][INFO] is_append: 1
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/models/multi_gpu_gpt/gpt_gemm.cc:74
Aborted (core dumped)
This need to fix docker
to nvidia-docker
.
nvidia-docker run -it \
--shm-size=1g --ulimit memlock=-1 \
-v ${WORKSPACE}:/workspace \
--name ft_backend_builder \
${TRITON_DOCKER_IMAGE} bash
reference: https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#prepare
Thank you for the feedback. There are many ways to use GPUs in docker, and we assume this is an environment setting for customer side because not everyone installs the nvidia-docker.