fastertransformer_backend issues

Why processing requests of batch size=1 is much slower than batch size>1

I did some performace test of a 3.5B bloom 1-gpu model using perf_analyzer, the result is |batch size | avg latency | | ----------- | ----------- | | 1 |...

mapcan

triton support using factertransfer backend for flan-ul2 and flan-ul2-alpaca-lora

How we can run triton with fastertransfer backend for flan-ul2-alpaca-lora? Please share the steps. how to do this?

ma-siddiqui

config file for flan-ul2-alpaca-lora - config.pbtxt

Cany any one share working config file for flan-ul2-alpaca-lora for triton?

ma-siddiqui

flan-ul2 sample config.pbtxt

Cany any one share working config file for flan-ul2 for triton?

ma-siddiqui

Feature request: Conversion from GPTBigCodeForCausalLM / Starcoder

1

Is it possible to integrate converter scripts for the GPTBigCodeForCausalLM architecture from the transformers libary? This would enable integration of models like Starcoder / Santacoder. With this, community projects like...

michaelfeil

How can I get stuck during generation?

### Description ```shell main, A100 ``` ### Reproduced Steps ```shell Hi I'm experimenting with gpt models using triton + fastertransformer_backend. I installed it according to the docs/gpt_guide.md in the docs...

amazingkmy

bug

When hot-loading a large model, a segmentation fault will occur.

1

### Description ```shell I start triton server with '--model-control-mode poll'. Segmentation fault occurs when modifying the model directory. ``` ### Reproduced Steps ```shell 1.CUDA_VISIBLE_DEVICES=3,4,5,6 /opt/tritonserver/bin/tritonserver --model-repository=/ft_workspace/all_models/t5/ --http-port 8008 --model-control-mode poll...

ppppppppig

bug

compile my own backend, libtriton_fastertransformer.so undefined symbol:

7

### Description ```shell UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE8toStringB5c I don't change ParallelGptTritonModel related code. But when start Triton server, it always fail. ```...

A-ML-ER

bug

Flan-T5 quality decreases with bigger models when using fastertransformer

10

### Description ```shell Branch: main Docker Version: 20.10.21 GPU Type: A100 40GB Triton Docker Image: triton_with_ft:22.12 ``` ### Reproduced Steps I'm following the instructions by @byshiue to test Flan-T5 with...

lakshaykc

bug

CUDA: Operation Not Supported

1

### Description Hi, I'm trying to run triton:22.03 / FasterTransformer within a kubernetes pod. Running ``` CUDA_VISIBLE_DEVICES=0 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/gptj/ ``` gives me this error: ``` what():...

nicobasile

bug

fastertransformer_backend
fastertransformer_backend copied to clipboard

Metadata

Why processing requests of batch size=1 is much slower than batch size>1

triton support using factertransfer backend for flan-ul2 and flan-ul2-alpaca-lora

config file for flan-ul2-alpaca-lora - config.pbtxt

flan-ul2 sample config.pbtxt

Feature request: Conversion from GPTBigCodeForCausalLM / Starcoder

How can I get stuck during generation?

When hot-loading a large model, a segmentation fault will occur.

compile my own backend, libtriton_fastertransformer.so undefined symbol:

Flan-T5 quality decreases with bigger models when using fastertransformer

CUDA: Operation Not Supported

← Metadata

Owner

Metadata

fastertransformer_backend fastertransformer_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastertransformer_backend
fastertransformer_backend copied to clipboard