fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

Results 73 fastertransformer_backend issues
Sort by recently updated
recently updated
newest added

We are trying to run Triton with FasterTransformer backend on a GKE cluster with A100 GPUs to serve models such as T5, UL2 which are hosted on Google Cloud Storage...

### Description ```shell It appears that Triton Server with Faster transformer backend doesn't work as expected when loading the model repository from S3 (containing both configuration and model weights). Release:...

bug

### Description ```shell The Latest faster transformer v5.1.1 which is being used by the Fastertransformer backend latest release prescribes that T5 decoder output - [output_ids and sequence_length] should be int32...

bug

### Description As defined in the [fastertransformers T5 guide](https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md) there is an output value for `cross_attentions`. I cannot find any way of returning `cross_attentions` on fastertransformers Triton backend for T5....

bug

### Description The problem: with "dynamic_batching" enabled, Triton inference server sometimes doesn't respond properly and logging "response is nullptr" several times, and sometimes crash. The model is a pretty standard...

bug

Hey, thanks for providing such a great tool! I noticed that gpt_guide.md mentions a parameter: `output_log_probs` . It records the log probability of logits at each step for sampling. `output_log_probs`...

### Description Expected behavior: ```shell >>> from transformers import AutoTokenizer >>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") >>> tokenizer.encode('') [50256] ``` ### Reproduced Steps Actual behavior: ```shell $ cd all_models/gptj/preprocessing/1 $ python >>>...

bug

Hi, thanks for supporting the BLOOM model in the latest release of fastertransformer backend. I tried the latest code on my 8x A6000 GPU server with 48G ram per GPU...

### Description ```shell I am trying to optimize T5-small inference using Fastertransformer. I am running on a single V100, I followed all the steps in `t5_guide.md` exactly and got a...

bug

Hi, I am using this backend for inference with GPT-J model ([Codegen](https://github.com/salesforce/CodeGen) converted to GPT-J checkpoint to be precise). And I'm trying to load more than one model instances to...