fastertransformer_backend issues

Can I enable streaming on an ensemble model?

3

In the ensemble model example for [gpt](https://github.com/triton-inference-server/fastertransformer_backend/tree/main/all_models/gpt), can I change the `fastertransformer` model to a `decoupled` model and enable streaming on the client side?

flexwang

Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs

Really appreciate the awesome work by the team - have managed to get almost a x100 speedup so far with the `fastertransformer_backend` on triton compared to plain PyTorch with a...

chunyat

Streaming throwing queue.get() error

2

### Description ```shell Dockerfile: faster_transformer(v1.2) Model: GPT-J ``` ### Reproduced Steps The streaming example in issue_requests.py throws the following error when passing in a request: ```shell Traceback (most recent call...

rtalaricw

bug

songkq

Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)

29

Hello. Than you for your work and framework! My goal is to host n instances of GPTJ-6B on N graphic cards. I want to have N instances with one model...

timofeev1995

enable llama model in FT backend

1

existing FT backend will throw error for llama model.

hongboshi1234

fastertransformer_backend
fastertransformer_backend copied to clipboard

Metadata

Can I enable streaming on an ensemble model?

Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs

Streaming throwing queue.get() error

Do I need to specify ARG SM=80 when building the image manually?

is_return_log_probs is required for decoupled model?

run end_to_end_test_llama.py error

Updated README.md to refer to 23.05 instead of 23.04

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)

enable llama model in FT backend

← Metadata

Owner

Metadata

fastertransformer_backend fastertransformer_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastertransformer_backend
fastertransformer_backend copied to clipboard