server icon indicating copy to clipboard operation
server copied to clipboard

Python InferenceServerClient (http) should not call close() from __del__

Open damonmaria opened this issue 2 years ago • 13 comments

Description The python http InferenceServerClient implements the __del__ finalizer and calls self.close(). The issue is that this uses gevent which must always be called from the same thread. Python can call __del__ from any thread. So when the client is finalized you'll get errors like this:

Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60>
Traceback (most recent call last):
"File ""/opt/conda/lib/python3.8/site-packages/tritonclient/http/__init__.py"", line 226, in __del__"
self.close()
"File ""/opt/conda/lib/python3.8/site-packages/tritonclient/http/__init__.py"", line 233, in close"
self._pool.join()
"File ""/opt/conda/lib/python3.8/site-packages/gevent/pool.py"", line 430, in join"
result = self._empty_event.wait(timeout=timeout)
"File ""src/gevent/event.py"", line 163, in gevent._gevent_cevent.Event.wait"
"File ""src/gevent/_abstract_linkable.py"", line 509, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait"
"File ""src/gevent/_abstract_linkable.py"", line 206, in gevent._gevent_c_abstract_linkable.AbstractLinkable._capture_hub"
"gevent.exceptions.InvalidThreadUseError: (<Hub '' at 0x7fde6e5f6580 epoll default pending=0 ref=0 fileno=5 resolver=<gevent.resolver.thread.Resolver at 0x7fdf44ba2190 pool=<ThreadPool at 0x7fde6e552be0 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7fde6e5f6580 thread_ident=0x7fdf4688c740>>> threadpool=<ThreadPool at 0x7fde6e552be0 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7fde6e5f6580 thread_ident=0x7fdf4688c740>> thread_ident=0x7fdf4688c740>, <Hub '' at 0x7fde6e4fc7c0 epoll pending=0 ref=0 fileno=19 thread_ident=0x7fddae7ed700>, <greenlet.greenlet object at 0x7fdf3f1f8b40 (otid=0x7fdf3f20f680) current active started main>)"

I am already explicitly closeing the client in the correct thread but InferenceServerClient does not realize it's already closed and so attempts to close it again.

Triton Information tritonclient[http] 2.19.0

To Reproduce Construct an InferenceServerClient in a multithreaded python app.

But it does require Python to call __del__ from a different thread which can't be controlled.

Expected behavior To not get these spurious warnings in our logs.

damonmaria avatar Apr 15 '22 00:04 damonmaria

@damonmaria thank you for filing this issue. Can you provide a detailed repro with a minimal example so that we can add additional handling for such cases?

CoderHam avatar Apr 15 '22 16:04 CoderHam

As I said in the To Reproduce section above, there is no ability AFAIK to control which thread __del__ will be called from. This is explicit in the Python docs on del:

del() can be invoked when arbitrary code is being executed, including from any arbitrary thread.

Given that, if InferenceServerClients __del__ calls into gevent then obviously it may not be on the correct thread as required by gevent.

Since the documentation is clear this situation can occur I'm not sure it's necessary trying to create a repo given the vagaries of which thread will call __del__.

damonmaria avatar Apr 15 '22 20:04 damonmaria

@jbkyang-nvi can you have a look at this?

CoderHam avatar Apr 18 '22 16:04 CoderHam

@damonmaria why are you manually calling close()? Also, just to make sure I understand, are you sharing the InferenceServerClient among your threads or do you have one client per thread?

jbkyang-nvi avatar Apr 18 '22 22:04 jbkyang-nvi

I create a client in the main thread at startup, get the model config and perform a sanity check on it. Because of gevent that client instance is tied to the main thread so I manually close it. When we start processing requests in a dedicated thread then I create a new client.

By calling close manually I'm ensuring it's called from the correct thread. If I leave it to close to be called through __del__ then it can't be guaranteed it will be called from the correct thread.

damonmaria avatar Apr 19 '22 01:04 damonmaria

Sorry, I realize I did not answer both your questions.

When the InferenceServerClient object is garbage collected (finalized) by Python then it's __del__ will be called and that will call close. So close is called no matter what. Even if I have manually called close already this second call to close from __del__ will still cause this gevent thread warning.

damonmaria avatar Apr 19 '22 02:04 damonmaria

@damonmaria sorry for the delay in response. I was meaning to ask: where are you seeing:

The issue is that this uses gevent which must always be called from the same thread.

we have multithreaded instances of InferenceServerClient and they don't need to be `close()``'ed manually

jbkyang-nvi avatar Apr 21 '22 20:04 jbkyang-nvi

@jbkyang-nvi

The issue is that this uses gevent which must always be called from the same thread.

The InferenceServerClient (http version only) uses gevent. My understanding, and experience with using it is that everything in a 'greenlet' has to happen on a single thread. Calling the client from different threads fails.

we have multithreaded instances of InferenceServerClient and they don't need to be `close()``'ed manually

As above. During initialization we create an InferenceServerClient, get the model config from the Trtiton server and perform a test / warmup inference. This is a sanity check so we fail early if there is an issue. The client used for this initialization cannot be reused during normal running because during normal running it is called from a different thread. Calls to the client from a different thread will fail. This initial client is only a local variable and goes out of scope. Python then garbage collects it and therefore calls the InferenceServerClient's __del__ method, which calls self.close(). But this call through __del__ happens on a random thread and causes the issue I posted about at the very top.

I imagine you have no need to call close because the clients you create are in use for the lifetime of your app. But that is not the case for us.

So, I am calling close() manually so that close is called from the correct thread. But this makes no difference because __del__ calls close no matter what, even if the client is already closed. And so this error shows up anyway.

damonmaria avatar Apr 22 '22 00:04 damonmaria

Just for reference, I see the same message Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60> although with a slightly different traceback

Exception ignored in: <function InferenceServerClient.__del__ at 0x7f7a9ff9edd0>
Traceback (most recent call last):
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 273, in __del__
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 281, in close
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1564, in close
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1548, in _close
AttributeError: 'NoneType' object has no attribute 'StatusCode'

The context where I get this is during test execution via pytest of a FastAPI application where a grpcclient.InferenceServerClient is attached to the application. I haven't seen this message when using the web application by itself, only when executing the test suite.

ivergara avatar Sep 02 '22 08:09 ivergara

@ivergara are you also calling close()? Can you share your client so we can reproduce the issue?

jbkyang-nvi avatar Sep 02 '22 19:09 jbkyang-nvi

Also for future reference. As stated here:

None of the methods of InferenceServerHttpClient are thread safe. The class is intended to be used by a single thread and simultaneously calling different methods with different threads is not supported and will cause undefined behavior.

The grpc client is more threadsafe as can be seen here

Most of the methods are thread-safe except Infer, AsyncInfer, StartStream StopStream and AsyncStreamInfer. Calling these functions from different threads will cause undefined behavior.

That being said, we could ask close() to remember when it has been called and not execute the second time

jbkyang-nvi avatar Sep 02 '22 23:09 jbkyang-nvi

@jbkyang-nvi As I've said above. The http client implements the __del__() method and calls close() from it. __del__ is called by Python itself when the object is garbage collected, not by user code. The Python docs state __del__ can be called from any thread.

So the http client is not thread safe, but it implements __del__ which can be called from any thread. Hence the issue.

@ivergara In the end I gave up on the http client and switched to gprc. Not only was my code cleaner in the end but I had much better performance with gprc.

damonmaria avatar Sep 03 '22 00:09 damonmaria

@ivergara are you also calling close()? Can you share your client so we can reproduce the issue?

I'm not calling close() explicitly. I will try to set up a small example where this shows up.

@ivergara In the end I gave up on the http client and switched to gprc. Not only was my code cleaner in the end but I had much better performance with gprc.

I'm using the gRPC client and still seeing the issue. As I said, this seems only to happen in testing where the test runner starts the webserver in a separate thread.

ivergara avatar Sep 03 '22 07:09 ivergara

I solved this problem by declaring CLIENT inside the function e.g.

def predict_batchsize(inputs, model_name='building', batchsize=64, inp_desc=("INPUT__0", "FP32"), otp_desc=("OUTPUT__0", "FP32")): CLIENT = grpc_client.InferenceServerClient(url="192.168.128.29:8001") ... preds = CLIENT.infer(model_name=model_name, inputs=[inp], outputs=[otp]).as_numpy(otp_desc[0])

Davidleeeeee avatar Nov 03 '22 07:11 Davidleeeeee

@Davidleeeeee

I solved this problem by declaring CLIENT inside the function e.g.

def predict_batchsize(inputs, model_name='building', batchsize=64, inp_desc=("INPUT__0", "FP32"), otp_desc=("OUTPUT__0", "FP32")): CLIENT = grpc_client.InferenceServerClient(url="192.168.128.29:8001") ... preds = CLIENT.infer(model_name=model_name, inputs=[inp], outputs=[otp]).as_numpy(otp_desc[0])

Where is the threading part of your client?

@ivergara

Would it be possible to add a small example for your grpc/http client?

jbkyang-nvi avatar Nov 04 '22 21:11 jbkyang-nvi

Please make sure python in correct supported : Programming Language Python :: 3 Python :: 3.6 Python :: 3.7 Python :: 3.8

Tuanlase02874 avatar Mar 29 '23 13:03 Tuanlase02874

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

jbkyang-nvi avatar Jul 11 '23 22:07 jbkyang-nvi

Just for reference, I see the same message Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60> although with a slightly different traceback

Exception ignored in: <function InferenceServerClient.__del__ at 0x7f7a9ff9edd0>
Traceback (most recent call last):
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 273, in __del__
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 281, in close
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1564, in close
  File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1548, in _close
AttributeError: 'NoneType' object has no attribute 'StatusCode'

The context where I get this is during test execution via pytest of a FastAPI application where a grpcclient.InferenceServerClient is attached to the application. I haven't seen this message when using the web application by itself, only when executing the test suite.

@ivergara how did you resolve this issue ?

danudeep90 avatar Nov 27 '23 06:11 danudeep90

Sorry for not reacting to this, I haven't worked actively with triton lately. As far as I am aware, we haven't solved the issue, because it showed up only in tests and we had to move to HTTP requests due to infrastructure constrains (deploying on a K8s cluster without a data mesh).

ivergara avatar Nov 27 '23 09:11 ivergara

Sorry for not reacting to this, I haven't worked actively with triton lately. As far as I am aware, we haven't solved the issue, because it showed up only in tests and we had to move to HTTP requests due to infrastructure constrains (deploying on a K8s cluster without a data mesh).

@ivergara : no worries. thank you !!

danudeep90 avatar Nov 28 '23 07:11 danudeep90