mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Is Gemma on device really this slow ?

Open MJ1998 opened this issue 1 year ago • 5 comments

I used llm_inference sample with gemma-2b-it-cpu-int4.bin on Pixel 8 Pro emulator.

The prefill speed seems to be in minutes.

Pixel 8 Pro configurations:- RAM - 22GB, VM heap - 512mb

Reference video https://github.com/googlesamples/mediapipe/assets/22965002/c7730dba-48e8-4eec-ae68-fe847d2778f2

MJ1998 avatar Apr 30 '24 14:04 MJ1998

Oh boy, no definitely not. It's not really intended to be run on the emulator, so your results are going to vary wildly. Here's a presentation I did last week with a slide showing Gemma running on a device in real-time (not sped up or altered, just recorded and turned into a gif) https://docs.google.com/presentation/d/1uetAcmkNWDXHEJaCt6WoBflDM1iMUU1N1ahzQof6PLM/edit#slide=id.g26cd5c56ad9_1_30

PaulTR avatar Apr 30 '24 14:04 PaulTR

I saw a post suggesting emulator with increased ram works similarly. Here it is - link - Search for "Creating an Android Emulator with Increased RAM"

What's the difference that makes physical device so much faster ? Is it particularly customized for gemma ?

Thanks for the prompt response!

MJ1998 avatar Apr 30 '24 14:04 MJ1998

No idea on that level of detail. My general experience over the last 10+ years with Android development though has always been "Eh, emulators are OK, but never as good as a real device"

PaulTR avatar Apr 30 '24 14:04 PaulTR

Time to first token is still pretty slow compared to the video you shared. Takes around 15 seconds for both 4bit and 8bit cpu versions of gemma2b. Physical device that I am using is pixel 7 pro.

MJ1998 avatar May 02 '24 08:05 MJ1998

i am using recent gemma 2 as well in my android pixel device and still the performance is too slow. is there anything we can do increase the performance in the andorid device. thanks

BalajiPolisetty2207 avatar Oct 01 '24 23:10 BalajiPolisetty2207

Echo to Paul's point, our infra was not well tested on emulator and there is no performance guarantee there. However, there is a known issue to run Gemma 2 model on real device that is causing the speed (i.e. time to first token) to be slow. We are actively working on it and hopefully it'll be resolved by the end of this year. Please stay patient and thanks.

yuhuichen1015 avatar Nov 08 '24 19:11 yuhuichen1015