MLServer Adaptive batching leads to parameters being cut off

trafficstars

Hi, I observed some weird behavior when using the REST API with adaptive batching enabled. When sending a single request to the v2 REST endpoint /v2/models/<MODEL>/infer the Parameters within the responseOutput are cut off. If a parameter is not an iterable, a TypeError is raised: e.g. TypeError: 'int' object is not iterable

Note that this only happens when:

Adaptive batching is enabled
A single request is sent within the max_batch_time time window

How to Reproduce:

# model.py 
from mlserver import MLModel
from mlserver.types import InferenceResponse, ResponseOutput, InferenceRequest

class EchoModel(MLModel):
	async def load(self):
		return True

        async def predict(self, payload: InferenceRequest):
		request_input = payload.inputs[0]
		# return the payload input as output
		output = ResponseOutput(**request_input.dict())
		return InferenceResponse(model_name=self.name, outputs=[output])

// model-settings.json
{
	"name": "echoModel",
	"max_batch_time": 2,
	"max_batch_size": 32,
	"implementation": "model.EchoModel"
}

Request Body:

// POST to localhost:8080/v2/models/echoModel/infer
{
	"inputs": [{
		"name": "docs",
		"shape": [2],
		"datatype": "INT32",
		"parameters": {
			"id": "123"
		},
		"data": [10,11]
	}]
}

Expected behavior: EchoModel returns the RequestInput as Output.

Actual behavior: Parameter in the output are cut off or TypeError is raised

Examples:

input parameters: {"custom-param": "123"} --> output parameters: {"custom-param": "1"}
input parameters: {"custom-params": ["123", "456"]} --> output parameters: {"custom-param": "123"}
input parameters: {"custom-param": 123 } --> TypeError: 'int' object is not iterable

It seems like the Parameters are unbatched even if they were never batched in the first place.

Jan 17 '24 14:01 tobbber

Hi @tobbber Can you share the Dockerfile used? I tried to wrap up my code as a similar way and set up the batch settings. Then I met the error of prometheus_client issue as below File "/opt/conda/lib/python3.8/site-packages/prometheus_client/metrics.py", line 121, in __init__ registry.register(self) File "/opt/conda/lib/python3.8/site-packages/prometheus_client/registry.py", line 29, in register raise ValueError( ValueError: Duplicated timeseries in CollectorRegistry: {'batch_request_queue_count', 'batch_request_queue_bucket', 'batch_request_queue_created', 'batch_request_queue_sum'} I used mlserver build and the generated Dockerfile use seldonio/mlserver:1.3.5-slim

Feb 12 '24 23:02 yaliqin

Hi @yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:

mlserver_example/
├── model-settings.json
└── model.py

To install mlserver i used pip install mlserver==1.3.5

Feb 13 '24 12:02 tobbber

Thank you very much! Which python version are you using?

On Tue, Feb 13, 2024 at 4:23 AM Tobi @.***> wrote:

Hi @yaliqin https://github.com/yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:

mlserver_example/ ├── model-settings.json └── model.py

To install mlserver i used pip install mlserver==1.3.5

— Reply to this email directly, view it on GitHub https://github.com/SeldonIO/MLServer/issues/1541#issuecomment-1941396953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGHEZNB5HWGQOTQ5KVORLYTNLNBAVCNFSM6AAAAABB6TOTQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGM4TMOJVGM . You are receiving this because you were mentioned.Message ID: @.***>

Feb 13 '24 14:02 yaliqin

I am using python 3.11.6 on a arm64 machine (M1 mac)

Feb 14 '24 13:02 tobbber

Thanks @tobbber. mlserver start . worked but the docker run failed. Will check the difference.

Feb 14 '24 17:02 yaliqin

MLServer MLServer copied to clipboard

Adaptive batching leads to parameters being cut off

How to Reproduce:

Expected behavior: EchoModel returns the RequestInput as Output.

Actual behavior: Parameter in the output are cut off or TypeError is raised

Examples:

MLServer
MLServer copied to clipboard