fastertransformer_backend
fastertransformer_backend copied to clipboard
Hi, I am wondering why in FasterTransformer, intra-node GPUs are bound to process-level, while in fastertransformer_backend, it is bound to thread-level? Since their src code are the same, why differs...
I am very interested in your work! Do you have any plans to support a [VisionEncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/vision-encoder-decoder) like the transformer-based OCR model?
Hi, I'm following the [setup guide](https://github.com/triton-inference-server/fastertransformer_backend#setup). I found a bug and solved it. https://github.com/triton-inference-server/fastertransformer_backend#setup ``` docker run -it \ --shm-size=1g --ulimit memlock=-1 \ -v ${WORKSPACE}:/workspace \ --name ft_backend_builder \ ${TRITON_DOCKER_IMAGE}...
### Description ```shell Host: linux amd64 GPU: RTX 3060 container version:22.12 GPT model converted from megatron (model files and configs are from gpt guide) dockerfile: ---- ARG TRITON_SERVER_VERSION FROM nvcr.io/nvidia/tritonserver:${TRITON_SERVER_VERSION}-py3...
### Description ```shell Branch: Main Base Docker Image: nvcr.io/nvidia/tritonserver:23.01-py3 (the image is likely irrelevant here) System: AGX Orin w/jetpack 5.1 ``` ### Reproduced Steps ```shell /workspace/fastertransformer_backend/build# cmake -D SM=87 -D...
### Description ```shell branch: dev/t5_gptj_blog triton version: 22.03 GPU: A100-40G ``` ### Reproduced Steps ```shell I refer to https://github.com/triton-inference-server/fastertransformer_backend/blob/dev/t5_gptj_blog/notebooks/GPT-J_and_T5_inference.ipynb to operate. The triton version I am using is 22.03, because...
### Description ```shell branch:main docker_version:22.12 gpu: A5000 ``` ### Reproduced Steps ```shell Created the docker image and installed gpt-j model.. the model runs and load and server is running at...
### Description ```shell main branch, V100 Deployed docker pods crashs and restarts every few minutes. It seems stable when qps is low. Below is error log before pods crashs which...
### Description ```shell branch: main fastertransformer docker: 22.12 ``` ### Reproduced Steps ```shell docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:${WORKSPACE} -w ${WORKSPACE} ${TRITON_DOCKER_IMAGE} bash # now in...
I followed the [tutorial to deploy NeMo Megatron on Triton](https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/) and it was working well. But I wanted to add ragged batching, so I just added `allow_ragged_batch: true` to the...