mediapipe GPU support for X86_64 is not available for LLM inference task

MediaPipe Solution (you are using)

LLM inference

Programming language

C++/Java

Are you willing to contribute it

Yes

Describe the feature and the current behaviour/state

For ARM, the tasks-genai plugin auto-downloaded by Gradle(maven) supports GPU model loading and inference. However, the MediaPipe source code seems to have support only for CPU inference (LlmInferenceEngine_CreateSession defined in llm_inference_engine_cpu.cc). How does the framework loads & executes the inference on GPU?

Will this change the current API? How?

No response

Who will benefit with this feature?

No response

Please specify the use cases for this feature

LLM Inference can run on X86_64 GPUs

Any Other info

Where can I find an auto-downloaded plugin that has a model to run on X86_64 GPUs.

Jul 18 '24 03:07 vraghavulu

Hi @vraghavulu,

That's correct. Our C API for the LLM inference task does not support GPU. Currently, the only way to use GPU is through our Maven package. We have marked this as a feature and are working to support GPU, but we do not have a timeline for availability. We request you to follow up with the issue we are already tracking here: MediaPipe Issue #5305, so we can close this issue.

Thank you!!

Jul 18 '24 05:07 kuaashish

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Jul 26 '24 01:07 github-actions[bot]

Hi, @kuaashish

In the current Android LLM Inference example code, it is set to use the Maven repository during the build process, and I haven't changed any code, including the Manifest. When using the GPU model(Gemma2B-gpu.bin), I encounter an error like MediaPipeException: internal: Failed to initialize session: %s Can not open OpenCL library on this device. Could this issue be due to the lack of GPU support, or is it related to the Android 14.0 | arm64 emulator device I am currently running?

Jul 30 '24 16:07 tjdtn0219

We are working actively on improving our API and are planning some large improvements over the coming months. For now, it is however true that:

We only support CPU inference for models converted via AI Edge Torch (https://github.com/google-ai-edge/ai-edge-torch)
We have only open sourced our CPU runtime. As such, you cannot yet build our inference engine with GPU support.
We do not yet support x86 on Android, which unfortunately means that for most users, we do not support emulators.

We are working actively on closing feature gaps and tracking all these issues internally.

Oct 17 '24 19:10 schmidt-sebastian

mediapipe mediapipe copied to clipboard

GPU support for X86_64 is not available for LLM inference task

MediaPipe Solution (you are using)

Programming language

Are you willing to contribute it

Describe the feature and the current behaviour/state

Will this change the current API? How?

Who will benefit with this feature?

Please specify the use cases for this feature

Any Other info

mediapipe
mediapipe copied to clipboard