server Triton ensemble not working as expected to support reshape

Triton ensemble not working as expected to support reshape

Open wzhongyuan opened this issue 1 month ago • 1 comments

Description

Hi Team,

I tried to config my ensemble model with reshape : https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#reshape, which is not working as expected.

For the ensemble model, I have two models : one Python as preprocessor and one onnx mode. Below is the generated config file for each including the ensemble one:

python preprocessor

name: "pre"
backend: "python"
max_batch_size: 8
input {
  name: "text"
  data_type: TYPE_STRING
  dims: 1
  reshape {
  }
}
output {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "attention_mask"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "token_type_ids"
  data_type: TYPE_INT64
  dims: -1
}
dynamic_batching {
  max_queue_delay_microseconds: 2000
}
instance_group {
  count: 4
}

Model

name: "main_app"
platform: "onnxruntime_onnx"
backend: "onnxruntime"
max_batch_size: 8
input {
  name: "input_ids"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "attention_mask"
  data_type: TYPE_INT64
  dims: -1
}
input {
  name: "token_type_ids"
  data_type: TYPE_INT64
  dims: -1
}
output {
  name: "embedding"
  data_type: TYPE_FP32
  dims: 768
}
dynamic_batching {
  max_queue_delay_microseconds: 2000
}
instance_group {
  count: 4
}

The ensemble

name: "ensemble"
platform: "ensemble"
max_batch_size: 8
input {
  name: "text"
  data_type: TYPE_STRING
  dims: 1
  reshape {
  }
}
output {
  name: "embedding"
  data_type: TYPE_FP32
  dims: 768
}
ensemble_scheduling {
  step {
    model_name: "pre"
    model_version: -1
    input_map {
      key: "text"
      value: "text"
    }
    output_map {
      key: "token_type_ids"
      value: "token_type_ids"
    }
    output_map {
      key: "input_ids"
      value: "input_ids"
    }
    output_map {
      key: "attention_mask"
      value: "attention_mask"
    }
  }
  step {
    model_name: "main_app"
    model_version: -1
    input_map {
      key: "token_type_ids"
      value: "token_type_ids"
    }
    input_map {
      key: "input_ids"
      value: "input_ids"
    }
    input_map {
      key: "attention_mask"
      value: "attention_mask"
    }
    output_map {
      key: "embedding"
      value: "embedding"
    }
  }
}

We can see from the config that both ensemble and preprocessor has reshape set. However, when I started the Triton server, I got below error

E0617 02:43:08.069270 106905 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, ensemble tensor text: inconsistent shape: [-1] is inferred from model ensemble while [-1,1] is inferred from model pre

Could you please help check and advise what's the issue and how we can address it? thanks

Triton Information What version of Triton are you using? 23.01 Are you using the Triton container or did you build it yourself? Triton container To Reproduce The above config showed it

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

Jun 17 '24 03:06 wzhongyuan

server server copied to clipboard

Triton ensemble not working as expected to support reshape

server
server copied to clipboard