mediapipe
mediapipe copied to clipboard
GPU support for X86_64 is not available for LLM inference task
MediaPipe Solution (you are using)
LLM inference
Programming language
C++/Java
Are you willing to contribute it
Yes
Describe the feature and the current behaviour/state
For ARM, the tasks-genai plugin auto-downloaded by Gradle(maven) supports GPU model loading and inference. However, the MediaPipe source code seems to have support only for CPU inference (LlmInferenceEngine_CreateSession defined in llm_inference_engine_cpu.cc). How does the framework loads & executes the inference on GPU?
Will this change the current API? How?
No response
Who will benefit with this feature?
No response
Please specify the use cases for this feature
LLM Inference can run on X86_64 GPUs
Any Other info
Where can I find an auto-downloaded plugin that has a model to run on X86_64 GPUs.
Hi @vraghavulu,
That's correct. Our C API for the LLM inference task does not support GPU. Currently, the only way to use GPU is through our Maven package. We have marked this as a feature and are working to support GPU, but we do not have a timeline for availability. We request you to follow up with the issue we are already tracking here: MediaPipe Issue #5305, so we can close this issue.
Thank you!!
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
Hi, @kuaashish
In the current Android LLM Inference example code, it is set to use the Maven repository during the build process, and I haven't changed any code, including the Manifest. When using the GPU model(Gemma2B-gpu.bin), I encounter an error like
MediaPipeException: internal: Failed to initialize session: %s Can not open OpenCL library on this device.
Could this issue be due to the lack of GPU support, or is it related to the Android 14.0 | arm64 emulator device I am currently running?
We are working actively on improving our API and are planning some large improvements over the coming months. For now, it is however true that:
- We only support CPU inference for models converted via AI Edge Torch (https://github.com/google-ai-edge/ai-edge-torch)
- We have only open sourced our CPU runtime. As such, you cannot yet build our inference engine with GPU support.
- We do not yet support x86 on Android, which unfortunately means that for most users, we do not support emulators.
We are working actively on closing feature gaps and tracking all these issues internally.