fastertransformer_backend
fastertransformer_backend copied to clipboard
Please help me /(ㄒoㄒ)/~~ ! failed to load 'fastertransformer' version 1: Unsupported: 1.
I implemented it step by step according to the tutorial of https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/bert_guide.md
git clone https://github.com/triton-inference-server/fastertransformer_backend.git
cd fastertransformer_backend
export WORKSPACE=$(pwd)
export CONTAINER_VERSION=22.12
export TRITON_DOCKER_IMAGE=triton_with_ft:${CONTAINER_VERSION}
python3 docker/create_dockerfile_and_build.py --triton-version 22.12
docker run -it --gpus=all --shm-size=4G -v $(pwd):/data -p 8888:8888 tritonserver_with_ft:latest bash
python3 FasterTransformer/examples/pytorch/bert/utils/huggingface_bert_convert.py \
-in_file bert-base-uncased/ \
-saved_dir ${WORKSPACE}/all_models/bert/fastertransformer/1/ \
-infer_tensor_para_size 2
the all_models floder like: all_models/bert/fastertransformer
- 1
- config.pbtxt
cat all_models/bert/fastertransformer/config.pbtxt
name: "fastertransformer"
backend: "fastertransformer"
default_model_filename: "bert"
max_batch_size: 128
input [
{
name: "input_hidden_state"
data_type: TYPE_FP32
dims: [ -1 ]
},
{
name: "sequence_lengths"
data_type: TYPE_FP32
dims: [ 1 ]
reshape: { shape: [ ] }
}
]
output [
{
name: "output_hidden_state"
data_type: TYPE_FP32
dims: [ -1, -1 ]
}
]
instance_group [
{
count: 1
kind : KIND_CPU
}
]
parameters {
key: "tensor_para_size"
value: {
string_value: "2"
}
}
parameters {
key: "pipeline_para_size"
value: {
string_value: "2"
}
}
parameters {
key: "data_type"
value: {
string_value: "fp32"
}
}
parameters {
key: "enable_custom_all_reduce"
value: {
string_value: "0"
}
}
parameters {
key: "model_type"
value: {
string_value: "bert"
}
}
parameters {
key: "model_checkpoint_path"
value: {
string_value: "all_models/bert/fastertransformer/1/2-gpu/"
}
}
parameters {
key: "int8_mode"
value: {
string_value: "0"
}
}
parameters {
key: "is_sparse"
value: {
string_value: "0"
}
}
parameters {
key: "is_remove_padding"
value: {
string_value: "1"
}
}
/workspace/build/fastertransformer_backend/build/bin/bert_gemm 32 32 12 64 1 0 2
CUDA_VISIBLE_DEVICES=0,1 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/ &
then, some errors:
I0412 10:02:39.230120 352 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0412 10:02:39.230132 352 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
I0412 10:02:39.362061 352 model_lifecycle.cc:459] loading: fastertransformer:1
I0412 10:02:39.497906 352 libfastertransformer.cc:1828] TRITONBACKEND_Initialize: fastertransformer
I0412 10:02:39.497935 352 libfastertransformer.cc:1838] Triton TRITONBACKEND API version: 1.10
I0412 10:02:39.497942 352 libfastertransformer.cc:1844] 'fastertransformer' TRITONBACKEND API version: 1.10
I0412 10:02:39.498984 352 libfastertransformer.cc:1876] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I0412 10:02:39.499868 352 libfastertransformer.cc:372] Instance group type: KIND_CPU count: 1
I0412 10:02:39.499993 352 libfastertransformer.cc:1899] TRITONBACKEND_ModelFinalize: delete model state
I0412 10:02:39.500008 352 libfastertransformer.cc:1904] TRITONBACKEND_ModelFinalize: MPI Finalize
E0412 10:02:39.545955 352 model_lifecycle.cc:597] failed to load 'fastertransformer' version 1: Unsupported: 1. Number of visible GPUs must be evenly divisble by TP * PP
2. Number of visible GPUs must be <= instance count * TP * PP
3. Multi-Node Inference only support one model instance
I0412 10:02:39.546100 352 server.cc:563]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0412 10:02:39.546185 352 server.cc:590]
+-------------------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| fastertransformer | /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt |
| | | /tritonserver/backends","default-max-batch-size":"4"}} |
+-------------------+-----------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
I0412 10:02:39.546239 352 server.cc:633]
+-------------------+---------+-----------------------------------------------------------------------------------------+
| Model | Version | Status |
+-------------------+---------+-----------------------------------------------------------------------------------------+
| fastertransformer | 1 | UNAVAILABLE: Unsupported: 1. Number of visible GPUs must be evenly divisble by TP * PP |
| | | 2. Number of visible GPUs must be <= instance count * TP * PP |
| | | 3. Multi-Node Inference only support one model instance |
+-------------------+---------+-----------------------------------------------------------------------------------------+
I0412 10:02:39.577786 352 metrics.cc:864] Collecting metrics for GPU 0: Tesla V100-PCIE-32GB
I0412 10:02:39.577819 352 metrics.cc:864] Collecting metrics for GPU 1: Tesla V100-PCIE-32GB
I0412 10:02:39.578574 352 metrics.cc:757] Collecting CPU metrics
build the project inside the container or build on the ECS?
You set TP=2 and PP=2 in your config, which requires 4 gpus. But you only have 2 gpus.
You set TP=2 and PP=2 in your config, which requires 4 gpus. But you only have 2 gpus.
You set TP=2 and PP=2 in your config, which requires 4 gpus. But you only have 2 gpus.
Thank you so much for your help! I understand now that I need to adjust the TP and PP settings based on the number of GPUs I have. My issue is resolved now. I appreciate your guidance and support! Wishing you all the best!
build the project inside the container or build on the ECS?
build the project inside the container or build on the ECS?
Thank you for your suggestion! Although I didn't end up resolving my issue using the approach you mentioned, I still appreciate your input and advice. Your willingness to help is truly appreciated. All the best to you!