fastertransformer_backend
fastertransformer_backend copied to clipboard
I followed the command below to compile the debug version and encountered the following error. How can I resolve it? **cmake -D CMAKE_EXPORT_COMPILE_COMMANDS=1 -D CMAKE_BUILD_TYPE=Debug -D ENABLE_FP8=OFF -D CMAKE_INSTALL_PREFIX=/opt/tritonserver -D...
### Description ```shell branch : main fastertransformer docker: 22:03 !tar -axf step_383500_slim.tar.zstd -C ./models/ 0501 19:22:32.031682 3840 libfastertransformer.cc:321] After Loading Model: I0501 19:22:32.032232 3840 libfastertransformer.cc:537] Model instance is created on...
### Description ```shell master ``` ### Reproduced Steps ```shell end_to_end_test.py ```
Dear Developers: I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes. I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved...
Hi, I'm trying to deploy a Faster Transformer based LLM using Triton on a GCP instance. I was wondering if there's a way to provide the file path to the...
How should I use FasterTransformer Triton to deploy my custom model, such as adding other structures after BERT? Assuming my model structure is defined like this: ```python class HfClassModel(): def...
I implemented it step by step according to the tutorial of [https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/bert_guide.md](url) ```shell git clone https://github.com/triton-inference-server/fastertransformer_backend.git cd fastertransformer_backend export WORKSPACE=$(pwd) export CONTAINER_VERSION=22.12 export TRITON_DOCKER_IMAGE=triton_with_ft:${CONTAINER_VERSION} python3 docker/create_dockerfile_and_build.py --triton-version 22.12 docker run...
### Description ```shell E0412 07:52:03.832683 14841 model_repository_manager.cc:1155] failed to load 'fastertransformer' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE13createNcclIdsEjb fastertransformer | 1 | UNAVAILABLE: Not...
### Description ```shell 按照教程跑CUDA_VISIBLE_DEVICES=1,2 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/ 命令时报错: E0412 06:53:22.368746 15182 model_repository_manager.cc:1927] Poll failed for model directory 'fastertransformer': model output must specify 'data_type' for fastertransformer ``` ###...
Hi, I am trying to use `perf_analyzer` on the predefined models in fastertransformer, such as gpt, gptj, and etc. I am very confused about how to properly set the `--shape`...