Dynamic Batching with Python Backend

Open srinivasaraov opened this issue 1 year ago • 0 comments

I have a python backend model with the following config.pbtxt without Dynamic Batching.

name: "sample"
backend: "python"
max_batch_size: 0

input [
  {
    name: "text" # Stringified JSON Array
    data_type: TYPE_STRING
    dims: [ 1 ] # Dynamic Batching
  },
  {
    name: "config" # Stringified JSON Array
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
output [
  {
    name: "results"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]
response_cache {
    enable: True
}

Here's my model.py execute method.

for request in requests:
            try:
                input_text_bytes = pb_utils.get_input_tensor_by_name(request, "text")
                input_text = [json.loads(text.decode()) for text in input_text_bytes.as_numpy()]

                config_tensor = pb_utils.get_input_tensor_by_name(request, "config")
                config = [json.loads(text.decode()) for text in config_tensor.as_numpy()]

                responses = []
                for idx, ind_input_text in enumerate(input_text):
                    ind_input_config = config[idx]
                    ind_input_config["doc"] = ind_input_text
                    results = self.sample_model.run(ind_input_config)["results"]
                    responses.append(results)
                responses_req.append(self.create_inference_response(responses))
            except Exception as e:
                responses = [{}]
                responses_req.append(self.create_inference_response(responses))
        return responses_req

I would like to enable dynamic batching without any changes to the client side (no changes to the REST API input format). Modified config.pbtxt with dynamic batching

name: "sample"
backend: "python"
max_batch_size: 8

dynamic_batching {}

input [
  {
    name: "text"
    data_type: TYPE_STRING
    dims: [ -1 ] # Dynamic Batching
  }
]
input [
  {
    name: "config"
    data_type: TYPE_STRING
    dims: [ -1 ]
  }
]
output [
  {
    name: "results"
    data_type: TYPE_STRING
    dims: [ -1 ]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]
response_cache {
    enable: True
}

Is it possible? If yes, what changes to be done in model.py ?

Aug 01 '24 06:08 srinivasaraov