MLServer
MLServer copied to clipboard
Adaptive batching leads to parameters being cut off
Hi, I observed some weird behavior when using the REST API with adaptive batching enabled.
When sending a single request to the v2 REST endpoint /v2/models/<MODEL>/infer the Parameters within the responseOutput are cut off. If a parameter is not an iterable, a TypeError is raised: e.g. TypeError: 'int' object is not iterable
Note that this only happens when:
- Adaptive batching is enabled
- A single request is sent within the
max_batch_timetime window
How to Reproduce:
# model.py
from mlserver import MLModel
from mlserver.types import InferenceResponse, ResponseOutput, InferenceRequest
class EchoModel(MLModel):
async def load(self):
return True
async def predict(self, payload: InferenceRequest):
request_input = payload.inputs[0]
# return the payload input as output
output = ResponseOutput(**request_input.dict())
return InferenceResponse(model_name=self.name, outputs=[output])
// model-settings.json
{
"name": "echoModel",
"max_batch_time": 2,
"max_batch_size": 32,
"implementation": "model.EchoModel"
}
Request Body:
// POST to localhost:8080/v2/models/echoModel/infer
{
"inputs": [{
"name": "docs",
"shape": [2],
"datatype": "INT32",
"parameters": {
"id": "123"
},
"data": [10,11]
}]
}
Expected behavior: EchoModel returns the RequestInput as Output.
Actual behavior: Parameter in the output are cut off or TypeError is raised
Examples:
- input parameters:
{"custom-param": "123"}--> output parameters:{"custom-param": "1"} - input parameters:
{"custom-params": ["123", "456"]}--> output parameters:{"custom-param": "123"} - input parameters:
{"custom-param": 123 }-->TypeError: 'int' object is not iterable
It seems like the Parameters are unbatched even if they were never batched in the first place.
Hi @tobbber Can you share the Dockerfile used? I tried to wrap up my code as a similar way and set up the batch settings. Then I met the error of prometheus_client issue as below
File "/opt/conda/lib/python3.8/site-packages/prometheus_client/metrics.py", line 121, in __init__ registry.register(self) File "/opt/conda/lib/python3.8/site-packages/prometheus_client/registry.py", line 29, in register raise ValueError( ValueError: Duplicated timeseries in CollectorRegistry: {'batch_request_queue_count', 'batch_request_queue_bucket', 'batch_request_queue_created', 'batch_request_queue_sum'}
I used mlserver build and the generated Dockerfile use seldonio/mlserver:1.3.5-slim
Hi @yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:
mlserver_example/
├── model-settings.json
└── model.py
To install mlserver i used pip install mlserver==1.3.5
Thank you very much! Which python version are you using?
On Tue, Feb 13, 2024 at 4:23 AM Tobi @.***> wrote:
Hi @yaliqin https://github.com/yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:
mlserver_example/ ├── model-settings.json └── model.py
To install mlserver i used pip install mlserver==1.3.5
— Reply to this email directly, view it on GitHub https://github.com/SeldonIO/MLServer/issues/1541#issuecomment-1941396953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGHEZNB5HWGQOTQ5KVORLYTNLNBAVCNFSM6AAAAABB6TOTQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGM4TMOJVGM . You are receiving this because you were mentioned.Message ID: @.***>
I am using python 3.11.6 on a arm64 machine (M1 mac)
Thanks @tobbber. mlserver start . worked but the docker run failed. Will check the difference.