fastertransformer_backend issues

Memory usage is doubled when loading a fp16 model into bf16

2

### Description ```shell Model: Gpt-NeoX GPU: A100 Tritonserver version: 22.12 ``` Hello, I'm not sure whether this is FasterTransformer's issue or backend's issue, but still I'm reporting it here. As...

skyser2003

bug

tritonserver version

If I don't use the Docker container method and customize the compilation using tritonserver, using the corresponding server v2.33.0 according to 23.04, are there any requirements for this version?

double-vin

Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

cabbagetalk

All flan-t5 doesn't work for me

3

### Description ```shell Hi everyone! I tried to reproduce the code from https://github.com/triton-inference-server/fastertransformer_backend/blob/dev/t5_gptj_blog/notebooks/GPT-J_and_T5_inference.ipynb. I couldn't use any of flan-t5 models. ``` ### Reproduced Steps ```shell I used main branch for...

PetroMaslov

bug

No response is received during inference in decoupled mode.

### Description ```shell main branch V100 my model type is GPTNeoX ``` ### Reproduced Steps ```shell https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gptneox_guide.md#decoupled-mode I ran tritonserver using my model after seeing this article. i changed request_output_len...

amazingkmy

bug

what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ?

1

### Description ```shell i fail to start preprossing and postprossing. so i start fastertransformer only, it works fine but the model performance is bad. so i wonder if the reason...

flyingjohn

bug

the docs are not updated with the source code.

Anyone can tell me where I can find the 'compose.py' or tell me the real steps to build. Thank you ![Screenshot from 2023-09-22 13-42-56](https://github.com/triton-inference-server/fastertransformer_backend/assets/69427071/675f5ba5-7497-40a8-a87e-171697863ec1)

trinhtuanvubk

Failed to run on H100 GPU with tensor para=8

5

The same setup works fine on A100x8, but on H100x8, saw below errors. ``` Caught signal 7 (Bus error: nonexistent physical address) ==== backtrace (tid: 30) ==== 0 0x0000000000042520 __sigaction()...

sfc-gh-zhwang

How to deploy multiple model in a node with multople GPUs

### Description ```shell Suppose I have 5 GPT models with each TP=2 and I want to deploy them in a machine with 8 GPUs. Is it possible? If so, how...

jjjjohnson

bug

Can i stop execution? (w/ `decoupled mode`)

1

### Description ```shell Docker: nvcr.io/nvidia/tritonserver:23.04-py3 Gpu: A100 How can i stop bi-direction streaming(decoupled mode)? - I want to stop model inference(streaming response) when the user disconnects or according to certain...

Yeom

bug

fastertransformer_backend
fastertransformer_backend copied to clipboard

Metadata

Memory usage is doubled when loading a fp16 model into bf16

tritonserver version

Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

All flan-t5 doesn't work for me

No response is received during inference in decoupled mode.

what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ?

the docs are not updated with the source code.

Failed to run on H100 GPU with tensor para=8

How to deploy multiple model in a node with multople GPUs

Can i stop execution? (w/ `decoupled mode`)

← Metadata

Owner

Metadata

fastertransformer_backend fastertransformer_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastertransformer_backend
fastertransformer_backend copied to clipboard