Gusha-nye

Results 6 issues of Gusha-nye

Hi,big guys! I would like to ask if llama.cpp is able to convert a multimodal model (e.g. Qwen2.5-vl-3B) to gguf format and quantize it to Q4_0? And is there a...

stale

Hi, big guys! Recently, I wanted to quantify various models inside Huggingface via auto-gptq, but I got an error when I installed auto-gptq as follows: ![Image](https://github.com/user-attachments/assets/18db9d47-44fc-4c86-beb3-bafccd2b7467) But I already have...

各位大佬好 目前我在Windows系统的AMD机器上利用MNN框架推理测试DeepSeek-7b和Qwen2.5-vl-3B模型,出现了以下几个问题: 一、对于Qwen2.5-vl-3B模型: 1.无论CPU推理还是设置opencl后端利用GPU推理,首Token延迟均会随着对话轮次的增多而增大,这是因为加入了历史对话作为下一轮对话的Prompt吗? 2.在GPU推理时,加载模型会出现以下图片中红色方框中的信息(其他机器未出现过),进行几轮问答或者进行VL的时候,会报错如下,请问这需要如何解决?已尝试过修改#define USE_INLINE_KEYWORD 1\n-->#define USE_INLINE_KEYWORD 0\n (出现如下报错信息时,gpu显存会暴跌): ![Image](https://github.com/user-attachments/assets/486cf098-ee3d-494a-bfc8-700303134db8) ![Image](https://github.com/user-attachments/assets/a965cadd-4071-48cd-8a6d-bbf6923cd73b) 二、对于Deepseek-7b模型: 可以在cpu推理,但是在设置opencl后端利用gpu推理时,依旧在cpu运行(GPU显存并未增长),请问这是什么原因呢? 我的电脑配置如下(测试机,显存、内存都很小,且均为集显) cpu:AMD Ryzen 7 8845H w/Radeon 780M Graphics GPU: AMD Radeon 780M Graphics 4GB Memory: 8GB...

bug
OpenCL
stale

After pruning Qwen2.5-3B using the examples/llm/prune_llm.py script and saving to get the model, I tried to load the pruned model using AutoModelForCausalLM.from_pretrained(), but it failed, reporting the following error: ![Image](https://github.com/user-attachments/assets/0bba5fcb-1140-4dcc-b36a-0c28dae349e6)...

Hi guys, recently I tried to run Qwen2.5-vl-3b on intel gpu with inference using ipex framework, but it failed and reported the following error: ![Image](https://github.com/user-attachments/assets/844ea2a6-4f50-47b2-8ad1-ba0dbf7266af) Can you tell me how...

Hi, big guys! 1. When reasoning about Deepseek-7B or Qwen2.5-3B model in NPU, when I choose the parameter load_in_low_bit = “sym_int4”, the response given by the model keeps repeating(example: question:...