MLServer Get metrics from the Queues in the inference in parallel/pool

trafficstars

Hi, we would like the number of the elements within the request queue in pool inside of a metric, for performance issues. It's a good idea to get this data to tune the system increasing or decreasing parallel workers and pods in Openshift. What do you think?

Sep 15 '22 09:09 alvarorsant

Hey @alvarorsant ,

That's a pretty cool idea.

Right now, there are a number of queues which would be valuable to expose through metrics. From the top of my head, I could think of the following points (the order represents when they are hit in the processing flow):

AsyncIO "task list" within the main process. This is technically not a queue, but it's a point where requests may get held up initially while the main process gets a free slot to process them.
Queue for adaptive batching, waiting for requests to form a full batch. This will only be present if adaptive batching is enabled though.
Queue for inference pool workers, waiting for a worker to pick up a new request.
AsyncIO "task list" within the workers itself. Again, this is technically not a queue, but it's also another place where requests may get held up waiting for the worker to be free to run inference.

Ideally, it would be useful to instrument all 4 points, so that we know how many requests are currently being held on each one. Additionally, we could also have a 5th overarching count of many requests are currently being processed within MLServer (i.e. all requests currently between step 1. and 4.).

It would be great to hear your thoughts on that approach @alvarorsant .

Sep 15 '22 12:09 adriangonz

We have already found point 2 and 3, but could you specify the exact location of point 1 and 4 in code? We are preparing a PR for you indeed :) and we would need accurate indications.

Thanks in advance.

Oct 04 '22 09:10 alvarorsant

Hey @alvarorsant ,

That's great to hear! Thanks a lot for taking the lead in this one! :rocket:

For points 1. and 4., this would be within Python's own runtime. When called, AsyncIO methods in Python get all converted into "tasks" and scheduled for execution into a list, which gets iterated through by an event loop. The longer this task list becomes, the longer it will take to execute each scheduled task.

I haven't looked much into this one TBH, but there seem to be a few suggestions here of how to approach this one:

https://splunktool.com/how-can-i-measure-the-length-of-an-asyncio-event-loop

i.e. mainly doing something like:

len(asyncio.all_tasks(asyncio.get_running_loop()))

Oct 04 '22 10:10 adriangonz

@adriangonz We opened this PR: https://github.com/SeldonIO/MLServer/pull/769 in which we added the metrics described in points 2 and 3 (Batch and Pool)

Oct 10 '22 08:10 miguelopind

Hi @adriangonz, I continue with @miguelopind's task. I opened this PR: https://github.com/SeldonIO/MLServer/pull/826, implementing points: 2,3,4. For implementing point 1, when you said 'AsyncIO "task list" within the main process', what do u exactly mean? Where is supposed it would be located that metric? Is it possible to monitor several processes in the main as labels?

Thanks in advance.

BTW, tell if the PR is ok for you or It'd need any modification.

Nov 02 '22 07:11 alvarorsant

Hey @alvarorsant,

Regarding your question, did you have a look at my previous comment (https://github.com/SeldonIO/MLServer/issues/728#issuecomment-1266739392)? Your main questions seem covered there, but please do let me know if there are any extra points you'd like to clarify.

Thanks for that PR BTW :+1: What should happen with the previous one though (https://github.com/SeldonIO/MLServer/pull/769)? If that one is not relevant anymore, could you sync with @miguelopind and remove it?

Nov 02 '22 11:11 adriangonz

Hello, @miguelopind left the company several days ago and he handed over the task but I don't have enough rights to close https://github.com/SeldonIO/MLServer/pull/769. I don't know if you have rights to do it.

Thanks a lot.

Nov 02 '22 12:11 alvarorsant

Hello @alvarorsant I can do it if you want, just tell me.

Nov 02 '22 12:11 miguelopind

Yes, @miguelopind, can you close it? I have already done a new PR gathering all the changes.

Thanks!

Nov 02 '22 13:11 alvarorsant

@alvarorsant done

Nov 02 '22 15:11 miguelopind

Hi Adrian, I've done a new PR https://github.com/SeldonIO/MLServer/pull/860 considering the changes you said (focused only in bacth queue and request queue with several tests)

Nov 21 '22 08:11 alvarorsant

Hey @alvarorsant ,

Thanks for making those changes.

Out of curiosity, is there any reason why you didn't update the existing PR instead? Either way, if #860 is now the up-to-date one, could you remove #826?

Nov 22 '22 16:11 adriangonz

Fixed by #860

Mar 30 '23 16:03 adriangonz

MLServer MLServer copied to clipboard

Get metrics from the Queues in the inference in parallel/pool

MLServer
MLServer copied to clipboard