BentoML bug: aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Describe the bug

I'm getting aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected with one of my models when I post 3 requests at the same time. Once the error happens, all other requests to the same service fail with the same problem.

This is not related to any cloud infrastructure because I can reproduce it in local docker container.

Raising amount of memory (above 4GiB) seems to help, though the error message does not indicate any memory issues.

This is the error log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/server/http_app.py", line 336, in api_func
    output = await run_in_threadpool(api.func, input_data)
  File "/usr/local/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.7/site-packages/anyio/to_thread.py", line 32, in run_sync
    func, *args, cancellable=cancellable, limiter=limiter
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/bentoml/bento/src/service.py", line 86, in logo_image_classifier_predict
    return _invoke_runner(models["logo_image_classifier"], "run", input)
  File "/home/bentoml/bento/src/service.py", line 66, in _invoke_runner
    result = getattr(runner, name).run(*input_npack)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
    return self.runner._runner_handle.run_method(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 266, in run_method
    *args,
  File "/usr/local/lib/python3.7/site-packages/anyio/from_thread.py", line 49, in run
    return asynclib.run_async_from_thread(func, *args)
  File "/usr/local/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 970, in run_async_from_thread
    return f.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 186, in async_run_method
    "Yatai-Bento-Deployment-Namespace": component_context.yatai_bento_deployment_namespace,
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 560, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 899, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/usr/local/lib/python3.7/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Expected behavior

The service does not crash.

Environment

bentoml: 1.0.15 python: 3.7.12 platform: Linux-5.15.0-1031-aws-x86_64-with-debian-11.2

Mar 14 '23 10:03 jsuchome

I notice that you have a relatively old version of BentoML. We did have a fix for this issue since this version (IIRC).

Is there any specific reason why you are locked to this version?

Mar 14 '23 10:03 aarnphm

Actually this is happening with 1.0.15 as well. When I was creating the issue, I switched to some old container. I'll update the report.

Mar 14 '23 11:03 jsuchome

Can you provide your service definition here? You can strip out anything sensitive

Mar 14 '23 11:03 aarnphm

bentofile.yaml:

service: "service:onnx_models" # Same as the argument passed to `bentoml serve`
labels:
  owner: ds-team-shipamax
  stage: dev
include:
  - "service.py" # A pattern for matching which files to include in the bento
  - "configuration.yml"
python:
  packages: # Additional pip packages required by the service
    - scipy==1.7.3
    - pandas==1.3.5
    - onnxruntime==1.13.1
    - onnx

Mar 14 '23 11:03 jsuchome

save_models_for_bento.py (snippets:)

    onnx_model = keras2onnx.convert_keras(model, model.name)
    bentoml.onnx.save_model("logo_image_classifier", onnx_model)

Mar 14 '23 11:03 jsuchome

Oh sorry, I mean your service.py.

Mar 14 '23 11:03 aarnphm

service.py:

def fix_arg(arg):
    fixed_arg = arg
    if isspmatrix_csr(arg):
        fixed_arg = arg.toarray()
    elif isinstance(arg, list):
        fixed_arg = np.array(arg)
    return fixed_arg

def json_to_ndarray(x: dict) -> np.ndarray:
    if 'buffer' in x:
        buffer = base64.b64decode(x['buffer'].encode('ascii'))
        return np.frombuffer(buffer, x['dtype']).reshape(*x['shape'])
    if 'pandas' in x:
        return pd.read_json(x["pandas"])
    return np.array(x['data'], dtype=x['dtype'])


def ndarray_to_json(x: np.ndarray, binary: bool = True) -> dict:
    if binary:
        return {
            'buffer': base64.b64encode(x.tobytes()).decode('ascii'),
            'dtype': x.dtype.str,
            'shape': list(x.shape)
        }
    return {
        'data': x.tolist(),
        'dtype': x.dtype.str,
    }
model_names = [
    ("logo_image_classifier", bentoml.onnx),
]


models = {m.split("/")[-1]: loader.get(f"{m.split('/')[-1]}:latest").to_runner() for m, loader in model_names}

onnx_models = bentoml.Service(
    name="onnx_models",
    runners=models.values()
)


def _invoke_runner(runner, name, input: str) -> str:
    bentoml_logger.debug(f"INPUT: type {type(input)}")
    input_npack = [json_to_ndarray(x) for x in input]
    result = getattr(runner, name).run(*input_npack)
    bentoml_logger.debug(f"Result: size: {len(result)} ({type(result)})")
    return ndarray_to_json(fix_arg(result))
@onnx_models.api(
    input=JSON(),
    output=JSON(),
    route="logo_image_classifier/predict"
)
def logo_image_classifier_predict(input):
    return _invoke_runner(models["logo_image_classifier"], "run", input)

Mar 14 '23 11:03 jsuchome

This service used to offer more models before, that's the reason for generating more runners, but we have narrowed the problem to onnx model

Mar 14 '23 12:03 jsuchome

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

Mar 14 '23 13:03 sudeepg545

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

Are you using Yatai or just deploying plain BentoML?

Mar 15 '23 21:03 sauyon

It's also happening on our end, on our end its happening for pytorch models and started to happen when the load is higher. Load is generated via locust and the issue doesn't happen when the spawn users is at ~200, but when we spawn users is higher than that we are starting to get this error. And the worst is when it happens there is no recovery i.e. once this happens every other request fails until the service is redeployed. Some details on our end - we do have many models running in parallel and sequence on the same input by following the inference graph approach and all the models are pytorch models loaded as custom runners i.e. each model having an independent runner. And our infrastructure is AWS where the bentoml service is deployed in AWS ECS.

Are you using Yatai or just deploying plain BentoML?

Plain BentoML

Mar 15 '23 21:03 sudeepg545

I wonder if circus is dying or failing to restart the runners. Do either of you have runner logs available? Maybe run with --debug?

Mar 17 '23 22:03 sauyon

bentoml.log

server log with --debug

Mar 20 '23 08:03 jsuchome

@sauyon any update on this issue? seems like memory issue, adding some more memory on the container seems to resolve the issue, but the error description dosen't indicate memory issue and also might not be sustainable to keep adding memory.

Apr 06 '23 14:04 sudeepg545

Just bumping this issue as we are also experiencing it in production, it leads to silent restarts which are very difficult to detect. Is there a way to force the application to completely stop in case of problem instead?

Oct 02 '23 18:10 nicjac

I am even facing this issue mainly on transformer models, any one had any breakthrough ?

Oct 16 '23 06:10 nadimintikrish

We suspect there's memory leak with aiohttp clients at some versions. @nicjac @nadimintikrish Could you help us with providing the scenario to reproduce them?

your system setups
source code
bentoml, python, and aiohttp version

Oct 17 '23 16:10 jianshen92

Hi @jianshen92 ! I hope this one I created earlier would help!

https://github.com/bentoml/BentoML/issues/4238

bentoml serve bento:xx works fine.

but containerizing and running the container kind of causes this issue.

Oct 17 '23 16:10 nadimintikrish

exactly same problem as here

bentoml serve works as expected. docker run --rm -p 3000:3000 <service:version> --> Error happens once the server gets a request.

Nov 06 '23 17:11 b-serra

BentoML BentoML copied to clipboard

bug: aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Describe the bug

Expected behavior

Environment

BentoML
BentoML copied to clipboard