Gusha-nye issues

Results 6 issues of


                                            Gusha-nye

Qwen2.5-vl support and conversion？

Hi，big guys！ I would like to ask if llama.cpp is able to convert a multimodal model (e.g. Qwen2.5-vl-3B) to gguf format and quantize it to Q4_0? And is there a...

stale

Install auto-gptq Failed!

Hi, big guys! Recently, I wanted to quantify various models inside Huggingface via auto-gptq, but I got an error when I installed auto-gptq as follows: ![Image](https://github.com/user-attachments/assets/18db9d47-44fc-4c86-beb3-bafccd2b7467) But I already have...

MNN在GPU推理时报错：Build program failed, err:-11 !

各位大佬好目前我在Windows系统的AMD机器上利用MNN框架推理测试DeepSeek-7b和Qwen2.5-vl-3B模型，出现了以下几个问题：一、对于Qwen2.5-vl-3B模型： 1.无论CPU推理还是设置opencl后端利用GPU推理，首Token延迟均会随着对话轮次的增多而增大，这是因为加入了历史对话作为下一轮对话的Prompt吗？ 2.在GPU推理时，加载模型会出现以下图片中红色方框中的信息（其他机器未出现过），进行几轮问答或者进行VL的时候，会报错如下，请问这需要如何解决？已尝试过修改#define USE_INLINE_KEYWORD 1\n-->#define USE_INLINE_KEYWORD 0\n （出现如下报错信息时，gpu显存会暴跌）： ![Image](https://github.com/user-attachments/assets/486cf098-ee3d-494a-bfc8-700303134db8) ![Image](https://github.com/user-attachments/assets/a965cadd-4071-48cd-8a6d-bbf6923cd73b) 二、对于Deepseek-7b模型：可以在cpu推理，但是在设置opencl后端利用gpu推理时，依旧在cpu运行（GPU显存并未增长），请问这是什么原因呢？我的电脑配置如下（测试机，显存、内存都很小，且均为集显） cpu：AMD Ryzen 7 8845H w/Radeon 780M Graphics GPU: AMD Radeon 780M Graphics 4GB Memory: 8GB...

bug

OpenCL

stale

Load error after pruning

After pruning Qwen2.5-3B using the examples/llm/prune_llm.py script and saving to get the model, I tried to load the pruned model using AutoModelForCausalLM.from_pretrained(), but it failed, reporting the following error: ![Image](https://github.com/user-attachments/assets/0bba5fcb-1140-4dcc-b36a-0c28dae349e6)...

Qwen 2.5-vl support？

Hi guys, recently I tried to run Qwen2.5-vl-3b on intel gpu with inference using ipex framework, but it failed and reported the following error: ![Image](https://github.com/user-attachments/assets/844ea2a6-4f50-47b2-8ad1-ba0dbf7266af) Can you tell me how...

NPU inference sym_int4 model error

Hi, big guys! 1. When reasoning about Deepseek-7B or Qwen2.5-3B model in NPU, when I choose the parameter load_in_low_bit = “sym_int4”, the response given by the model keeps repeating（example: question:...