MLServer
MLServer copied to clipboard
Asyncio Key Error Under Load
When load testing an MLServer (deployed on AWS EKS with SC-V2) with this setup I get the following error whenever the size of the batches in my load tests exceeds ~2/3:
mlserver 2023-07-24 10:28:25,275 [mlserver.parallel] ERROR - Response processing loop crashed. Restarting the loop...
mlserver Traceback (most recent call last):
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 55, in _process_responses_cb
mlserver process_responses.result()
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 76, in _process_responses
mlserver await self._process_response(response)
mlserver File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 81, in _process_response
mlserver async_response = self._async_responses[internal_id]
mlserver KeyError: '93821e47-8589-48d2-a1c1-79a145b5ccf2'
mlserver 2023-07-24 10:28:25,276 [mlserver.parallel] DEBUG - Starting response processing loop...
For context, I'm using a server configured as follows:
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
name: mlserver-test
namespace: seldon-v2
spec:
serverConfig: mlserver
replicas: 5
podSpec:
containers:
- name: mlserver
env:
- name: MLSERVER_PARALLEL_WORKERS
value: "1"
- name: SELDON_LOG_LEVEL
value: DEBUG
resources:
requests:
memory: "1000Mi"
cpu: "500m"
limits:
memory: "4000Mi"
cpu: "1000m"
And I do have adaptive batching enabled:
{
"name": "test-model",
"implementation": "wrapper.Model",
"parameters": {
"uri": "./model.onnx",
"environment_tarball": "./environment.tar.gz"
},
"max_batch_size": 10,
"max_batch_time": 0.5
}
The payloads are not particularly large (float32 - [1, 192, 256]) and I can see from monitoring the pods' memory consumption that they are well within the resource limits specified. I've also tried setting MLSERVER_PARALLEL_WORKERS to 0 which does solve this issue, but only by virtue of disabling parallel workers.
Hey @edfincham ,
It could be that the parallel workers are crashing for some unknown reason. Is there any other stacktrace that you can see from the logs?