Katherine Yang

Results 100 comments of Katherine Yang

@damonmaria why are you manually calling `close()`? Also, just to make sure I understand, are you sharing the `InferenceServerClient` among your threads or do you have one client per thread?

@damonmaria sorry for the delay in response. I was meaning to ask: where are you seeing: > The issue is that this uses gevent which must always be called from...

@ivergara are you also calling `close()`? Can you share your client so we can reproduce the issue?

Also for future reference. As stated [here](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/library/http_client.h#L90-L95): > None of the methods of InferenceServerHttpClient are thread safe. The class is intended to be used by a single thread and simultaneously...

@Davidleeeeee > I solved this problem by declaring CLIENT inside the function e.g. > > def predict_batchsize(inputs, model_name='building', batchsize=64, inp_desc=("INPUT__0", "FP32"), otp_desc=("OUTPUT__0", "FP32")): CLIENT = grpc_client.InferenceServerClient(url="192.168.128.29:8001") ... preds = CLIENT.infer(model_name=model_name,...

Hi @narolski sorry for the late response. I think it is true Triton client does not deallocate memory while the request is not completed. If that is the problem, we...

You can also read about this in the [architecture documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#concurrent-model-execution)

@wdongdongde can you provide a small reproducible example for the client?

Closing ticket because of inactivity. @wdongdongde please reopen with more information if you would like us to look at it

It looks like the name of the model is: `${MODEL_NAME}`. Is that correct?