mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

How to get LLM model performance?

Open KYUNGSOO-LEE opened this issue 1 year ago • 1 comments

Hi

I would like to get the performance of Gemma model on-device(android) with medoapipe.

I read blog about llm model with mediapipe. (https://developers.googleblog.com/en/large-language-models-on-device-with-mediapipe-and-tensorflow-lite/)

How to get LLM model performance(e.g. TTFT TPOT)?

I installed LLM inference example. But I can not find any logs about performance.

KYUNGSOO-LEE avatar Jun 16 '24 00:06 KYUNGSOO-LEE

I've been trying to look for the same thing. Would love to see something from the devs regarding being able to find prefill token speed and decode token speed by ourselves.

@KYUNGSOO-LEE as a crude substitute in the meantime, I am using .sizeInTokens() to find the input prompt token size and divide that by time for inference. I am calculating inference time using timeSource.markNow() before and after .generateResponse(). Maybe this can be a rough metric for you too.

AkulRT avatar Jul 24 '24 12:07 AkulRT

Hi! MediaPipe-LLM-Inference task does not have benchmark related API at the moment and the performance numbers are evaluated internally with sophisticate handling.

However, as AkulRT suggested, .sizeInTokens() could help you get the token counts in a given string/text, and you could use it to have a rough estimations.

Linchenn avatar Oct 31 '24 17:10 Linchenn

I modified the original demo to add the display of prefill and decode speeds. The code as follows:

https://github.com/wangzhaode/mediapipe-llm-demo/blob/main/android/app/src/main/java/com/google/mediapipe/examples/llminference/InferenceModel.kt

wangzhaode avatar Jan 15 '25 12:01 wangzhaode