mllm
mllm copied to clipboard
Fast Multimodal LLM on Mobile Devices
Hello developers,I have noticed that you already support QNN backend,what a excellent work! Do you have any plan to support MTK APU via its NeuroPilot SDK?
QNN alloc size: 4194304 QNN alloc size: 262144 model.layers.0.self_attn.ires_split-00_view_is QNN INT8 op 0.0ms [ ERROR ] Tensor name InceptionV3_InceptionV3_Conv2d_1a_3x3_Conv2D_stride already exists in the graph. [ ERROR ] QnnModel::addTensor() Creating tensor...
I fine-tune the Gemma2 2B Instruction with BitsAndBytes(int4). It works when test with the transformer. Then I follow the guide to build the mllm and quantize the model for linux....
[Issue while building apk ](https://github.com/lx200916/ChatBotApp/issues/3#issue-2489257920) without qnn build as well getting same kind of error.
Hello, Thank you for sharing your valuable code with the community. After compiling and copying all the files, I was trying to run the qwen npu model on Galaxy S24...
Hello, I've been asking a lot of questions today. After building the Android phone app I created as an example and installing it on a Galaxy S22 model with 12GB...
Can these examples be built as .so files that can be used in the python code on Android?
I am trying to use Owen-2.0 in mllm. I converted the model, vocab using the given tools. However, the outputs of Owen-2.0 was garbled. Do I need to do any...
How did you obtain the two model files, qwen-1.5-1.8b-chat-int8.mllm and qwen-1.5-1.8b-chat-q4k.mllm?
你好 在LLaMAAdd.cpp中的如下函数 ``` int32_t hvx_add_af( float *restrict input, float *restrict input2, float *restrict output, uint32_t size) { ... sline1 = Q6_V_valign_VVR(sline1c, sline1p, (size_t)input); sline2 = Q6_V_valign_VVR(sline2c, sline2p, (size_t)input2); ... }...