mllm icon indicating copy to clipboard operation
mllm copied to clipboard

Fast Multimodal LLM on Mobile Devices

Results 21 mllm issues
Sort by recently updated
recently updated
newest added

Hello developers,I have noticed that you already support QNN backend,what a excellent work! Do you have any plan to support MTK APU via its NeuroPilot SDK?

QNN alloc size: 4194304 QNN alloc size: 262144 model.layers.0.self_attn.ires_split-00_view_is QNN INT8 op 0.0ms [ ERROR ] Tensor name InceptionV3_InceptionV3_Conv2d_1a_3x3_Conv2D_stride already exists in the graph. [ ERROR ] QnnModel::addTensor() Creating tensor...

I fine-tune the Gemma2 2B Instruction with BitsAndBytes(int4). It works when test with the transformer. Then I follow the guide to build the mllm and quantize the model for linux....

[Issue while building apk ](https://github.com/lx200916/ChatBotApp/issues/3#issue-2489257920) without qnn build as well getting same kind of error.

Hello, Thank you for sharing your valuable code with the community. After compiling and copying all the files, I was trying to run the qwen npu model on Galaxy S24...

Hello, I've been asking a lot of questions today. After building the Android phone app I created as an example and installing it on a Galaxy S22 model with 12GB...

Can these examples be built as .so files that can be used in the python code on Android?

I am trying to use Owen-2.0 in mllm. I converted the model, vocab using the given tools. However, the outputs of Owen-2.0 was garbled. Do I need to do any...

How did you obtain the two model files, qwen-1.5-1.8b-chat-int8.mllm and qwen-1.5-1.8b-chat-q4k.mllm?

你好 在LLaMAAdd.cpp中的如下函数 ``` int32_t hvx_add_af( float *restrict input, float *restrict input2, float *restrict output, uint32_t size) { ... sline1 = Q6_V_valign_VVR(sline1c, sline1p, (size_t)input); sline2 = Q6_V_valign_VVR(sline2c, sline2p, (size_t)input2); ... }...