fastertransformer_backend
fastertransformer_backend copied to clipboard
### Description ```shell System H100 and L40, driver 530.30.02 and cuda 12.1 Build is failing with following error, no mather the branch used (main, v1.4, fix/multi_instance, etc..) [ 54%] Built...
### Description branch: `v1.4` docker version: `22.12` `huggingface_bert_convert.py` can't convert some key ``` python3 FasterTransformer/examples/pytorch/bert/utils/huggingface_bert_convert.py \ -in_file bert-base-uncased/ \ -saved_dir ${WORKSPACE}/all_models/bert/fastertransformer/1/ \ -infer_tensor_para_size 1 ``` Response: ``` =============== Argument ===============...
### Description ```shell main branch as of 02/13/2023 Build crashes at 57% with no additional information. I was able to successfully build using 22.09 today to validate that nothing on...
Looks like the answer is no and it will fail if we put a deberta model: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/src/libfastertransformer.cc#L333. Is there future plan to support deberta?
### Description ```shell The Docker built fine using the older version mentioned in readme (22.12), but when trying to build using the latest docker (23.05) it fails. See this log...
I have 4 GPUs and 3 models called small, medium and large. I want to deploy small model on GPU 0, medium model on GPU 1, and large model on...
fix typo in docker run script https://github.com/triton-inference-server/fastertransformer_backend#rebuilding-fastertransformer-backend-optional
Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written
Hi, when I ensemble a fastertransformer_backend GPT model, it loaded the ensemble model failed with the error when starting the server. Could you please give some advice? Thanks. ``` CUDA_VISIBLE_DEVICES="0,1"...
Hi there, I'm new to the FasterTransformer backend, and I'm curious about why we need to set max_batch_size to 1 when the interactive mode is enabled. The documentation says that...