BentoML
BentoML copied to clipboard
bug: aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Describe the bug
I'm getting aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
with one of my models when I post 3 requests at the same time. Once the error happens, all other requests to the same service fail with the same problem.
This is not related to any cloud infrastructure because I can reproduce it in local docker container.
Raising amount of memory (above 4GiB) seems to help, though the error message does not indicate any memory issues.
This is the error log:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 336, in api_func
output = await run_in_threadpool(api.func, input_data)
File "/usr/local/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/usr/local/lib/python3.7/site-packages/anyio/to_thread.py", line 32, in run_sync
func, *args, cancellable=cancellable, limiter=limiter
File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/bentoml/bento/src/service.py", line 86, in logo_image_classifier_predict
return _invoke_runner(models["logo_image_classifier"], "run", input)
File "/home/bentoml/bento/src/service.py", line 66, in _invoke_runner
result = getattr(runner, name).run(*input_npack)
File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
return self.runner._runner_handle.run_method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 266, in run_method
*args,
File "/usr/local/lib/python3.7/site-packages/anyio/from_thread.py", line 49, in run
return asynclib.run_async_from_thread(func, *args)
File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 970, in run_async_from_thread
return f.result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 186, in async_run_method
"Yatai-Bento-Deployment-Namespace": component_context.yatai_bento_deployment_namespace,
File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1141, in __aenter__
self._resp = await self._coro
File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 560, in _request
await resp.start(conn)
File "/usr/local/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 899, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/usr/local/lib/python3.7/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Expected behavior
The service does not crash.
Environment
bentoml
: 1.0.15
python
: 3.7.12
platform
: Linux-5.15.0-1031-aws-x86_64-with-debian-11.2
I notice that you have a relatively old version of BentoML. We did have a fix for this issue since this version (IIRC).
Is there any specific reason why you are locked to this version?
Actually this is happening with 1.0.15 as well. When I was creating the issue, I switched to some old container. I'll update the report.
Can you provide your service definition here? You can strip out anything sensitive
bentofile.yaml:
service: "service:onnx_models" # Same as the argument passed to `bentoml serve`
labels:
owner: ds-team-shipamax
stage: dev
include:
- "service.py" # A pattern for matching which files to include in the bento
- "configuration.yml"
python:
packages: # Additional pip packages required by the service
- scipy==1.7.3
- pandas==1.3.5
- onnxruntime==1.13.1
- onnx
save_models_for_bento.py
(snippets:)
onnx_model = keras2onnx.convert_keras(model, model.name)
bentoml.onnx.save_model("logo_image_classifier", onnx_model)
Oh sorry, I mean your service.py
.
service.py:
def fix_arg(arg):
fixed_arg = arg
if isspmatrix_csr(arg):
fixed_arg = arg.toarray()
elif isinstance(arg, list):
fixed_arg = np.array(arg)
return fixed_arg
def json_to_ndarray(x: dict) -> np.ndarray:
if 'buffer' in x:
buffer = base64.b64decode(x['buffer'].encode('ascii'))
return np.frombuffer(buffer, x['dtype']).reshape(*x['shape'])
if 'pandas' in x:
return pd.read_json(x["pandas"])
return np.array(x['data'], dtype=x['dtype'])
def ndarray_to_json(x: np.ndarray, binary: bool = True) -> dict:
if binary:
return {
'buffer': base64.b64encode(x.tobytes()).decode('ascii'),
'dtype': x.dtype.str,
'shape': list(x.shape)
}
return {
'data': x.tolist(),
'dtype': x.dtype.str,
}
model_names = [
("logo_image_classifier", bentoml.onnx),
]
models = {m.split("/")[-1]: loader.get(f"{m.split('/')[-1]}:latest").to_runner() for m, loader in model_names}
onnx_models = bentoml.Service(
name="onnx_models",
runners=models.values()
)
def _invoke_runner(runner, name, input: str) -> str:
bentoml_logger.debug(f"INPUT: type {type(input)}")
input_npack = [json_to_ndarray(x) for x in input]
result = getattr(runner, name).run(*input_npack)
bentoml_logger.debug(f"Result: size: {len(result)} ({type(result)})")
return ndarray_to_json(fix_arg(result))
@onnx_models.api(
input=JSON(),
output=JSON(),
route="logo_image_classifier/predict"
)
def logo_image_classifier_predict(input):
return _invoke_runner(models["logo_image_classifier"], "run", input)
This service used to offer more models before, that's the reason for generating more runners, but we have narrowed the problem to onnx model
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
Are you using Yatai or just deploying plain BentoML?
It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.
Are you using Yatai or just deploying plain BentoML?
Plain BentoML
I wonder if circus is dying or failing to restart the runners. Do either of you have runner logs available? Maybe run with --debug
?
@sauyon any update on this issue? seems like memory issue, adding some more memory on the container seems to resolve the issue, but the error description dosen't indicate memory issue and also might not be sustainable to keep adding memory.
Just bumping this issue as we are also experiencing it in production, it leads to silent restarts which are very difficult to detect. Is there a way to force the application to completely stop in case of problem instead?
I am even facing this issue mainly on transformer models, any one had any breakthrough ?
We suspect there's memory leak with aiohttp clients at some versions. @nicjac @nadimintikrish Could you help us with providing the scenario to reproduce them?
- your system setups
- source code
- bentoml, python, and aiohttp version
Hi @jianshen92 ! I hope this one I created earlier would help!
https://github.com/bentoml/BentoML/issues/4238
bentoml serve bento:xx works fine.
but containerizing and running the container kind of causes this issue.
bentoml serve
works as expected.
docker run --rm -p 3000:3000 <service:version>
--> Error happens once the server gets a request.