Run Qwen1.5 0.5B on ChatBotApp
I'm trying to run Qwen1.5 0.5B model on ChatBotApp.
To run the model, I referred to th model configuration (here). And, I modified the LibHelper.cpp#L75 into the following code:
qwconfig = QWenConfig(tokens_limit, "0.5B", RoPEType::LLAMAROPE);
( tried also qwconfig = QWenConfig(tokens_limit, "0.5B"); )
And, I built the libmllm_lib.a and run the code with it. However, the result is not normal.
It spits weird and meaningless letters only.
Is there anything else I missed? Or my modification is wrong? Thank you for letting me know.
Sorry for your confusion.
In the case of qwconfig = QWenConfig(tokens_limit, "0.5B"); code, it works well for some questions.
Thus, what I wanna say ultimately is that some cases makes right answers, however some cases makes repetitive same words. So, by the infinite same token generation, it makes the word upto the token limits.
Please refer to the following exceptional picture.
Or it makes weird words such as "You 100%" for an answer of the question "Can you make any python code?".
Please could you explain why and if we can resolve this issue?
Did you try out any of the examples you mentioned above in terminal?
No, I didn't. The situation that I'm in is not suitable for me. Is it for that you want to know if they're related to engine-level or app-level issue?
@kjh2159 Which quantized model are you using? I will test it later.
Thank you for your enthusiastic efforts.
I'm using q4_k quantization of this huggingface repository
By the way, this model is fine-tuned model (for chat, instruct, etc.) or not?
I tested your example in the command line on my device and it ran correctly. This should be a bug in the app rather than a bug in the mllm lib. cc @lx200916 Can you help out, If you have time?
I tested your example in the command line on my device and it ran correctly. This should be a bug in the app rather than a bug in the mllm lib. cc @lx200916 Can you help out, If you have time?
In my personal opinion, the Kotlin code on the Android side only takes on the task of rendering answers, and the JNI code is merely a glue layer for the mllm Lib, with most of the code directly adapted from the command line example. So, perhaps we could try testing the app on our test devices?
@chenghuaWang @lx200916 Thank you for your big efforts. I tried this model on other devices additionally, correspondingly Galaxy Z Fold 4 and Google Pixel9
By the way, on other devices, the responses are generated on the right way. The device that I use for the previous example (the first image) is Galaxy S22 Ultra.
Only on this device (S22 Ultra), Qwen1.5 0.5B makes weird responses. Is it related to hardware instruction set?
I think this is kind of interesting finding. It might be pretty constructive to contemplate the reason together.
Did you manage to get the 0.5b VL version working?
About VL model, no. I didn't verify my hypothesis, though I guess that there are some unsupported operators, such as SVE. Do you have any ideas for it? (I didn't understand why you asked about the VL model, can you clarify it?)
@amirvenus
Did you manage to get the 0.5b VL version working?
The Qwen2VL-2B is supported. see https://github.com/UbiquitousLearning/mllm/blob/main/examples/demo_qwen2_vl.cpp
Did you manage to get the 0.5b VL version working?
The Qwen2VL-2B is supported. see https://github.com/UbiquitousLearning/mllm/blob/main/examples/demo_qwen2_vl.cpp
Thanks!
I think the docs should be updated to reflect and highlight Qwen2.5-VL-0.5B that is just a perfect fit for mobile devices.