fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

Results 73 fastertransformer_backend issues
Sort by recently updated
recently updated
newest added

### Description ```shell Model: Gpt-NeoX GPU: A100 Tritonserver version: 22.12 ``` Hello, I'm not sure whether this is FasterTransformer's issue or backend's issue, but still I'm reporting it here. As...

bug

If I don't use the Docker container method and customize the compilation using tritonserver, using the corresponding server v2.33.0 according to 23.04, are there any requirements for this version?

### Description ```shell Hi everyone! I tried to reproduce the code from https://github.com/triton-inference-server/fastertransformer_backend/blob/dev/t5_gptj_blog/notebooks/GPT-J_and_T5_inference.ipynb. I couldn't use any of flan-t5 models. ``` ### Reproduced Steps ```shell I used main branch for...

bug

### Description ```shell main branch V100 my model type is GPTNeoX ``` ### Reproduced Steps ```shell https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gptneox_guide.md#decoupled-mode I ran tritonserver using my model after seeing this article. i changed request_output_len...

bug

### Description ```shell i fail to start preprossing and postprossing. so i start fastertransformer only, it works fine but the model performance is bad. so i wonder if the reason...

bug

Anyone can tell me where I can find the 'compose.py' or tell me the real steps to build. Thank you ![Screenshot from 2023-09-22 13-42-56](https://github.com/triton-inference-server/fastertransformer_backend/assets/69427071/675f5ba5-7497-40a8-a87e-171697863ec1)

The same setup works fine on A100x8, but on H100x8, saw below errors. ``` Caught signal 7 (Bus error: nonexistent physical address) ==== backtrace (tid: 30) ==== 0 0x0000000000042520 __sigaction()...

### Description ```shell Suppose I have 5 GPT models with each TP=2 and I want to deploy them in a machine with 8 GPUs. Is it possible? If so, how...

bug

### Description ```shell Docker: nvcr.io/nvidia/tritonserver:23.04-py3 Gpu: A100 How can i stop bi-direction streaming(decoupled mode)? - I want to stop model inference(streaming response) when the user disconnects or according to certain...

bug