fastertransformer_backend issues

Questions about different intra-node settings for fastertransformer_backend and FasterTransformer

4

Hi, I am wondering why in FasterTransformer, intra-node GPUs are bound to process-level, while in fastertransformer_backend, it is bound to thread-level? Since their src code are the same, why differs...

YJHMITWEB

Any support plan for VisionEncoderDecoderModel?

1

I am very interested in your work! Do you have any plans to support a [VisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) like the transformer-based OCR model?

kangsan0420

enhancement

CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT

1

Hi, I'm following the [setup guide](https://github.com/triton-inference-server/fastertransformer_backend#setup). I found a bug and solved it. https://github.com/triton-inference-server/fastertransformer_backend#setup ``` docker run -it \ --shm-size=1g --ulimit memlock=-1 \ -v ${WORKSPACE}:/workspace \ --name ft_backend_builder \ ${TRITON_DOCKER_IMAGE}...

lkm2835

triton server crashed after reload the same model

2

### Description ```shell Host: linux amd64 GPU: RTX 3060 container version:22.12 GPT model converted from megatron (model files and configs are from gpt guide) dockerfile: ---- ARG TRITON_SERVER_VERSION FROM nvcr.io/nvidia/tritonserver:${TRITON_SERVER_VERSION}-py3...

heiruwu

bug

CUDA architecture ignored when passed to Cmake

5

### Description ```shell Branch: Main Base Docker Image: nvcr.io/nvidia/tritonserver:23.01-py3 (the image is likely irrelevant here) System: AGX Orin w/jetpack 5.1 ``` ### Reproduced Steps ```shell /workspace/fastertransformer_backend/build# cmake -D SM=87 -D...

hillct

bug

E0315 1107 server.cc:201] Failed to finalize CUDA memory manager: CNMEM_STATUS_CUDA_ERROR

### Description ```shell branch: dev/t5_gptj_blog triton version: 22.03 GPU: A100-40G ``` ### Reproduced Steps ```shell I refer to https://github.com/triton-inference-server/fastertransformer_backend/blob/dev/t5_gptj_blog/notebooks/GPT-J_and_T5_inference.ipynb to operate. The triton version I am using is 22.03, because...

WangYizhang01

bug

Getting empty response from GPT-J Model

8

### Description ```shell branch:main docker_version:22.12 gpu: A5000 ``` ### Reproduced Steps ```shell Created the docker image and installed gpt-j model.. the model runs and load and server is running at...

vax-dev

bug

server crashs when traffic is a little bit high

10

### Description ```shell main branch, V100 Deployed docker pods crashs and restarts every few minutes. It seems stable when qps is low. Below is error log before pods crashs which...

rahuan

bug

GPT-J streaming: getting garbage response

1

### Description ```shell branch: main fastertransformer docker: 22.12 ``` ### Reproduced Steps ```shell docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:${WORKSPACE} -w ${WORKSPACE} ${TRITON_DOCKER_IMAGE} bash # now in...

vax-dev

bug

Ragged Batching on Megatron Fast Transformer Backend

4

I followed the [tutorial to deploy NeMo Megatron on Triton](https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/) and it was working well. But I wanted to add ragged batching, so I just added `allow_ragged_batch: true` to the...

mshuffett

fastertransformer_backend
fastertransformer_backend copied to clipboard

Metadata

Questions about different intra-node settings for fastertransformer_backend and FasterTransformer

Any support plan for VisionEncoderDecoderModel?

CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT

triton server crashed after reload the same model

CUDA architecture ignored when passed to Cmake

E0315 1107 server.cc:201] Failed to finalize CUDA memory manager: CNMEM_STATUS_CUDA_ERROR

Getting empty response from GPT-J Model

server crashs when traffic is a little bit high

GPT-J streaming: getting garbage response

Ragged Batching on Megatron Fast Transformer Backend

← Metadata

Owner

Metadata

fastertransformer_backend fastertransformer_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastertransformer_backend
fastertransformer_backend copied to clipboard