server
server copied to clipboard
Python InferenceServerClient (http) should not call close() from __del__
Description
The python http InferenceServerClient
implements the __del__
finalizer and calls self.close()
. The issue is that this uses gevent which must always be called from the same thread. Python can call __del__
from any thread. So when the client is finalized you'll get errors like this:
Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60>
Traceback (most recent call last):
"File ""/opt/conda/lib/python3.8/site-packages/tritonclient/http/__init__.py"", line 226, in __del__"
self.close()
"File ""/opt/conda/lib/python3.8/site-packages/tritonclient/http/__init__.py"", line 233, in close"
self._pool.join()
"File ""/opt/conda/lib/python3.8/site-packages/gevent/pool.py"", line 430, in join"
result = self._empty_event.wait(timeout=timeout)
"File ""src/gevent/event.py"", line 163, in gevent._gevent_cevent.Event.wait"
"File ""src/gevent/_abstract_linkable.py"", line 509, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait"
"File ""src/gevent/_abstract_linkable.py"", line 206, in gevent._gevent_c_abstract_linkable.AbstractLinkable._capture_hub"
"gevent.exceptions.InvalidThreadUseError: (<Hub '' at 0x7fde6e5f6580 epoll default pending=0 ref=0 fileno=5 resolver=<gevent.resolver.thread.Resolver at 0x7fdf44ba2190 pool=<ThreadPool at 0x7fde6e552be0 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7fde6e5f6580 thread_ident=0x7fdf4688c740>>> threadpool=<ThreadPool at 0x7fde6e552be0 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7fde6e5f6580 thread_ident=0x7fdf4688c740>> thread_ident=0x7fdf4688c740>, <Hub '' at 0x7fde6e4fc7c0 epoll pending=0 ref=0 fileno=19 thread_ident=0x7fddae7ed700>, <greenlet.greenlet object at 0x7fdf3f1f8b40 (otid=0x7fdf3f20f680) current active started main>)"
I am already explicitly close
ing the client in the correct thread but InferenceServerClient
does not realize it's already closed and so attempts to close it again.
Triton Information tritonclient[http] 2.19.0
To Reproduce
Construct an InferenceServerClient
in a multithreaded python app.
But it does require Python to call __del__
from a different thread which can't be controlled.
Expected behavior To not get these spurious warnings in our logs.
@damonmaria thank you for filing this issue. Can you provide a detailed repro with a minimal example so that we can add additional handling for such cases?
As I said in the To Reproduce section above, there is no ability AFAIK to control which thread __del__
will be called from. This is explicit in the Python docs on del:
del() can be invoked when arbitrary code is being executed, including from any arbitrary thread.
Given that, if InferenceServerClient
s __del__
calls into gevent
then obviously it may not be on the correct thread as required by gevent
.
Since the documentation is clear this situation can occur I'm not sure it's necessary trying to create a repo given the vagaries of which thread will call __del__
.
@jbkyang-nvi can you have a look at this?
@damonmaria why are you manually calling close()
? Also, just to make sure I understand, are you sharing the InferenceServerClient
among your threads or do you have one client per thread?
I create a client in the main thread at startup, get the model config and perform a sanity check on it. Because of gevent that client instance is tied to the main thread so I manually close
it. When we start processing requests in a dedicated thread then I create a new client.
By calling close
manually I'm ensuring it's called from the correct thread. If I leave it to close
to be called through __del__
then it can't be guaranteed it will be called from the correct thread.
Sorry, I realize I did not answer both your questions.
When the InferenceServerClient
object is garbage collected (finalized) by Python then it's __del__
will be called and that will call close
. So close
is called no matter what. Even if I have manually called close
already this second call to close
from __del__
will still cause this gevent thread warning.
@damonmaria sorry for the delay in response. I was meaning to ask: where are you seeing:
The issue is that this uses gevent which must always be called from the same thread.
we have multithreaded instances of InferenceServerClient
and they don't need to be `close()``'ed manually
@jbkyang-nvi
The issue is that this uses gevent which must always be called from the same thread.
The InferenceServerClient
(http version only) uses gevent. My understanding, and experience with using it is that everything in a 'greenlet' has to happen on a single thread. Calling the client from different threads fails.
we have multithreaded instances of InferenceServerClient and they don't need to be `close()``'ed manually
As above. During initialization we create an InferenceServerClient
, get the model config from the Trtiton server and perform a test / warmup inference. This is a sanity check so we fail early if there is an issue. The client used for this initialization cannot be reused during normal running because during normal running it is called from a different thread. Calls to the client from a different thread will fail. This initial client is only a local variable and goes out of scope. Python then garbage collects it and therefore calls the InferenceServerClient
's __del__
method, which calls self.close()
. But this call through __del__
happens on a random thread and causes the issue I posted about at the very top.
I imagine you have no need to call close
because the clients you create are in use for the lifetime of your app. But that is not the case for us.
So, I am calling close()
manually so that close is called from the correct thread. But this makes no difference because __del__
calls close
no matter what, even if the client is already closed. And so this error shows up anyway.
Just for reference, I see the same message Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60>
although with a slightly different traceback
Exception ignored in: <function InferenceServerClient.__del__ at 0x7f7a9ff9edd0>
Traceback (most recent call last):
File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 273, in __del__
File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 281, in close
File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1564, in close
File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1548, in _close
AttributeError: 'NoneType' object has no attribute 'StatusCode'
The context where I get this is during test execution via pytest
of a FastAPI application where a grpcclient.InferenceServerClient
is attached to the application. I haven't seen this message when using the web application by itself, only when executing the test suite.
@ivergara are you also calling close()
? Can you share your client so we can reproduce the issue?
Also for future reference. As stated here:
None of the methods of InferenceServerHttpClient are thread safe. The class is intended to be used by a single thread and simultaneously calling different methods with different threads is not supported and will cause undefined behavior.
The grpc client is more threadsafe as can be seen here
Most of the methods are thread-safe except Infer, AsyncInfer, StartStream StopStream and AsyncStreamInfer. Calling these functions from different threads will cause undefined behavior.
That being said, we could ask close()
to remember when it has been called and not execute the second time
@jbkyang-nvi As I've said above. The http client implements the __del__()
method and calls close()
from it. __del__
is called by Python itself when the object is garbage collected, not by user code. The Python docs state __del__
can be called from any thread.
So the http client is not thread safe, but it implements __del__
which can be called from any thread. Hence the issue.
@ivergara In the end I gave up on the http client and switched to gprc. Not only was my code cleaner in the end but I had much better performance with gprc.
@ivergara are you also calling close()? Can you share your client so we can reproduce the issue?
I'm not calling close()
explicitly. I will try to set up a small example where this shows up.
@ivergara In the end I gave up on the http client and switched to gprc. Not only was my code cleaner in the end but I had much better performance with gprc.
I'm using the gRPC client and still seeing the issue. As I said, this seems only to happen in testing where the test runner starts the webserver in a separate thread.
I solved this problem by declaring CLIENT inside the function e.g.
def predict_batchsize(inputs, model_name='building', batchsize=64, inp_desc=("INPUT__0", "FP32"), otp_desc=("OUTPUT__0", "FP32")): CLIENT = grpc_client.InferenceServerClient(url="192.168.128.29:8001") ... preds = CLIENT.infer(model_name=model_name, inputs=[inp], outputs=[otp]).as_numpy(otp_desc[0])
@Davidleeeeee
I solved this problem by declaring CLIENT inside the function e.g.
def predict_batchsize(inputs, model_name='building', batchsize=64, inp_desc=("INPUT__0", "FP32"), otp_desc=("OUTPUT__0", "FP32")): CLIENT = grpc_client.InferenceServerClient(url="192.168.128.29:8001") ... preds = CLIENT.infer(model_name=model_name, inputs=[inp], outputs=[otp]).as_numpy(otp_desc[0])
Where is the threading part of your client?
@ivergara
Would it be possible to add a small example for your grpc/http client?
Please make sure python in correct supported : Programming Language Python :: 3 Python :: 3.6 Python :: 3.7 Python :: 3.8
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
Just for reference, I see the same message
Exception ignored in: <function InferenceServerClient.__del__ at 0x7fdf0d0c7a60>
although with a slightly different tracebackException ignored in: <function InferenceServerClient.__del__ at 0x7f7a9ff9edd0> Traceback (most recent call last): File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 273, in __del__ File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 281, in close File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1564, in close File "/home/ivergara/mambaforge/envs/demo/lib/python3.10/site-packages/grpc/_channel.py", line 1548, in _close AttributeError: 'NoneType' object has no attribute 'StatusCode'
The context where I get this is during test execution via
pytest
of a FastAPI application where agrpcclient.InferenceServerClient
is attached to the application. I haven't seen this message when using the web application by itself, only when executing the test suite.
@ivergara how did you resolve this issue ?
Sorry for not reacting to this, I haven't worked actively with triton lately. As far as I am aware, we haven't solved the issue, because it showed up only in tests and we had to move to HTTP requests due to infrastructure constrains (deploying on a K8s cluster without a data mesh).
Sorry for not reacting to this, I haven't worked actively with triton lately. As far as I am aware, we haven't solved the issue, because it showed up only in tests and we had to move to HTTP requests due to infrastructure constrains (deploying on a K8s cluster without a data mesh).
@ivergara : no worries. thank you !!