server icon indicating copy to clipboard operation
server copied to clipboard

Is python backend going to support asyncio?

Open ZhuYuJin opened this issue 4 years ago • 13 comments

Is your feature request related to a problem? Please describe. Is python backend going to support asyncio?

Describe the solution you'd like Coroutine has better performance than concurrency in the situation of network IO.

Describe alternatives you've considered Referring to grpc-python, we can start a python thread to run asyncio loop. We also start a c++ thread to sniffer and forward packages. Python thread and c++ thread communicate with queue.

Additional context Our business has strong reliance on asyncio. We can assign a manpower for the job.

ZhuYuJin avatar Oct 19 '21 02:10 ZhuYuJin

Can you elaborate where you want support for asyncio? Python backend already supports asyncio in BLS https://github.com/triton-inference-server/python_backend#business-logic-scripting-beta. There is also a ticket on our roadmap to support asyncio for sending the responses back to the server.

Tabrizian avatar Oct 19 '21 14:10 Tabrizian

https://github.com/triton-inference-server/python_backend/blob/main/src/pb_stub.cc#L482

image

In current python backend implementation, the coroutine is called synchronously with asyncio.run. It should has bad performance in real workspace.

We need to send RPC or http requests in python backend in a more efficient way. @Tabrizian

ZhuYuJin avatar Oct 20 '21 11:10 ZhuYuJin

Can you elaborate more on the use case that you want asyncio support for? The current async io support in Python backend is only for the async BLS requests. The goal was to let the user have multiple inflight BLS requests in their Python model. There is another feature on our roadmap to allow sending the response of requests in Python backend using async. Do you think this feature is suitable for your use case or you are describing a separate feature?

Tabrizian avatar Oct 21 '21 02:10 Tabrizian

My feature is to use async to send RPC/HTTP requests(download images, etc.) without blocking the python thread. @Tabrizian

ZhuYuJin avatar Oct 21 '21 02:10 ZhuYuJin

I see.. I am wondering if you do not want to block when the execute function returns then when do you need the results from your RPC/HTTP requests and how are you going to use them? Is it something that you want to run in the background regardless of the requests that are being executed on your model?

Tabrizian avatar Oct 21 '21 02:10 Tabrizian

For example, a ensemble backend send 10 requests to a python backend. Each request should download a image using async. The python thread should download 10 images simultaneously, rather than do it synchronously.

ZhuYuJin avatar Oct 21 '21 02:10 ZhuYuJin

If your requests are batched, you can use async in with the current version of Python backend to create a coroutine for each request to download the images asynchronously. You can also create 10 model instances of your Python model so that each request will be executed independently and the image download happens asynchronously. https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#instance-groups

Tabrizian avatar Oct 21 '21 03:10 Tabrizian

It seems a temporary solution. However we want to use a model instance with asyncio loop to finish the job. Coroutine must has better performance than concurrency in the situation of network IO. Make python backend be capable to run multiple requests simultaneously should be a good feature to me. @Tabrizian

ZhuYuJin avatar Oct 21 '21 03:10 ZhuYuJin

Let me explain more explicitly.

[Background] Each request is a user request. Ensemble backend can push all requests to the queue of python backend. By now, python backend receives a request from the queue, executes the request, and sends back the response. Each request is executed synchronously.

[Problem] So, my problem is python backend cannot process multiple requests concurrently in a instance. Asyncio should enable concurrent processing with coroutine. However, python backend use asyncio.run to execute async function, which will cause each request is executed synchronously.

[Solution]

  1. Merge multiple requests into a batch request should solve the problem in some cases. However, if only one request arrives in the first window, we can only process the first request. And the following requests should wait until the first request is done. If our python function is to download a video, the latency of user request is not controllable.
  2. Starting multiple instances is also a solution. However, suppose we have the concurrency of 100 qps, each request should download a video. Starting 100 instances is not actual.

[Suggestion] My suggestion is to start a new repo async python backend. We can start a python thread to run asyncio loop. C++ thread will get data from triton queue, send it to python thread, and wait for the response of python thread.(This is the architecture of grpc-python) I'm wondering whether you are interested in the suggestion.

ZhuYuJin avatar Oct 21 '21 08:10 ZhuYuJin

@ZhuYuJin we have made a note of your request for adding such a feature.

CoderHam avatar Nov 30 '21 22:11 CoderHam

did we start this feature?

manhtd98 avatar Jul 03 '22 21:07 manhtd98

@manhtd98 We have implemented decoupled API support which partially addresses the feature requested here: https://github.com/triton-inference-server/python_backend#decoupled-mode

Having said that we have a feature on the road map to support full async API in python backend but it has not been scheduled.

Tabrizian avatar Jul 04 '22 14:07 Tabrizian

I have error with asyncio and memory not auto clean after predict.

manhtd98 avatar Jul 05 '22 11:07 manhtd98

@Tabrizian Any updates on this?

charlesmelby avatar Jan 11 '23 08:01 charlesmelby

@Tabrizian We have a similar use case (although in our case we're waiting on BLS-type triton server requests not downloads). The decoupled model is a viable alternative but only if it can be called from an ensemble. Is calling decoupled models from ensembles supported right now?

charlesmelby avatar Feb 03 '23 01:02 charlesmelby

Any updates on this? I have the same problem, serial processing of requests cause insufficient performance

lvmnn avatar Jul 12 '23 09:07 lvmnn

@ZhuYuJin Is there any progress?

lvmnn avatar Aug 04 '23 08:08 lvmnn

Sorry for the delayed response. This is on our roadmap but it has not been scheduled yet. We'll let you know as soon as there is an updated.

Tabrizian avatar Aug 04 '23 15:08 Tabrizian