vllm [Feature]: Compute and log the serving FLOPs

🚀 The feature, motivation and pitch

vLLM should output the serving FLOPs. This should be helpful for debugging performance and check the GPU utilization.

Alternatives

No response

Additional context

No response

Mar 19 '24 07:03 zhuohan123

Happy to take this up!

Apr 02 '24 03:04 ayusher

@ayusher Just out of curiosity, how would you go about doing this? Let's take the simplest case of a single non-sharded model running on a machine. I assume we want to log the actual FLOPs / MACs (multiply accumulates) that the hardware is doing, and not estimate it from the modules of the model e.g. via profiling or just theoretically counting per-module, as this does not take into consideration the kernel implementation of the module (different implementations of attention in e.g. cuda can require different amount of FLOPs, right?). What would be the right way of doing this, where we also do not introduce too much overhead?

Apr 04 '24 16:04 gardberg

Yeah, after further research this looks like a more involved problem than I initially anticipated. I'm not sure on the best approach or if this is worth considering right now.

Apr 26 '24 18:04 ayusher

Yeah I agree. I know that some nvidia gpu libraries enable you to get flops, but does not seem like a trivial problem (not a good first issue I guess lol). I'd love to help if we find a manageable way though!

Apr 26 '24 18:04 gardberg

Thank you all!

Are the flops required to be accurate? Is it okay if the FLOPS numbers are a close estimation?

As discussed by others in this thread it's not very straightforward to calculate FLOPS without digging into internals. That may also add overheads during inference. If okay with a close estimation, I can try to implement a basic FLOPS calculator.

Aug 24 '24 03:08 rakshithvasudev

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Nov 23 '24 02:11 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Dec 24 '24 02:12 github-actions[bot]

can I take this up?

Jan 29 '25 09:01 krtkvrm

hi @zhuohan123 is this issue still open to work on ?

Apr 17 '25 07:04 gangula-karthik

@zhuohan123 @robertgshaw2-redhat @WoosukKwon Hi, is this issue still open? I'd like to take this if it is still needed.

Apr 30 '25 16:04 duhaode520

@duhaode520 Do you have any ideas as to how to approach this?

May 19 '25 08:05 plops655

I have tried addressing this in my PR. Would be nice if anyone could have a look at it.

Jun 06 '25 17:06 sysradium

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Sep 05 '25 02:09 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Oct 06 '25 02:10 github-actions[bot]