mllm icon indicating copy to clipboard operation
mllm copied to clipboard

Fast Multimodal LLM on Mobile Devices

Results 21 mllm issues
Sort by recently updated
recently updated
newest added

Hi As i followed the steps to build and run on Samsung s24 android device, facing below error mllm/scripts$ ./run_fuyu.sh ../vocab/fuyu_vocab.mllm: 1 file pushed, 0 skipped. 34.1 MB/s (5854575 bytes...

Fail To Load Models! Please Check if models exists at /sdcard/Download/model and restart but i have copied model in that location. still getting same error message what would be the...

bug

我看了下代码,我的理解是prefill做的是预处理部分的工作,主要的推理是在decode部分完成,为什么代码里面是把prefill放在了npu上去执行,而重要的decode阶段要放在CPU上去执行?

Hi, mllm-qnn can work on my device oppo findx7 ultra(snapdragon 8gen 3+16G RAM). However, the prefill speed for Qwen1.5-1.8B is approximately 4-6 tokens per second, which significantly diverges from the...

When we run the main_gwen_npu on xiaomi14, it has the follow crash log: ![WechatIMG16036](https://github.com/user-attachments/assets/ff87f385-867b-4be0-bbcd-b019f780eb2a)

DDR size = 16GB ./main_qwen_npu -s 64 -c 1 -l 512 below is tail logs ` Memory Usage: 8910 MB(19036) at: execute graph: 94 chunk:1 execute qnn graph 95 model.layers.23.self_attn.or_split...

Hello, I've execute `main_qwen_npu` folloing the [guideline](https://github.com/UbiquitousLearning/mllm/tree/main/src/backends/qnn). In fact, there were minor bugs so I've manually fixed them. (e.g., missing `adb push ../vocab/qwen_merges.txt ...`). When I ran `main_qwen_npu`, Android crahsed...

Hello author, I am doing research on LLM heterogeneous computation. When I was browsing the code, I noticed that MLLM's Net class has some content about subgraph. My question is,...