mllm
mllm copied to clipboard
Fast Multimodal LLM on Mobile Devices
Hi As i followed the steps to build and run on Samsung s24 android device, facing below error mllm/scripts$ ./run_fuyu.sh ../vocab/fuyu_vocab.mllm: 1 file pushed, 0 skipped. 34.1 MB/s (5854575 bytes...
Help me to calculate yk/s while running in android
Fail To Load Models! Please Check if models exists at /sdcard/Download/model and restart but i have copied model in that location. still getting same error message what would be the...
我看了下代码,我的理解是prefill做的是预处理部分的工作,主要的推理是在decode部分完成,为什么代码里面是把prefill放在了npu上去执行,而重要的decode阶段要放在CPU上去执行?
Hi, mllm-qnn can work on my device oppo findx7 ultra(snapdragon 8gen 3+16G RAM). However, the prefill speed for Qwen1.5-1.8B is approximately 4-6 tokens per second, which significantly diverges from the...
When we run the main_gwen_npu on xiaomi14, it has the follow crash log: 
DDR size = 16GB ./main_qwen_npu -s 64 -c 1 -l 512 below is tail logs ` Memory Usage: 8910 MB(19036) at: execute graph: 94 chunk:1 execute qnn graph 95 model.layers.23.self_attn.or_split...
Hello, I've execute `main_qwen_npu` folloing the [guideline](https://github.com/UbiquitousLearning/mllm/tree/main/src/backends/qnn). In fact, there were minor bugs so I've manually fixed them. (e.g., missing `adb push ../vocab/qwen_merges.txt ...`). When I ran `main_qwen_npu`, Android crahsed...
Hello author, I am doing research on LLM heterogeneous computation. When I was browsing the code, I noticed that MLLM's Net class has some content about subgraph. My question is,...
add_profilling_activation