allerou4 comments

Results 9 comments of


                                            allerou4

Inference failure in the Qwen3 Omni quantized model

Hi, anybody successfully quantized qwen3 omni?

Inference failure in the Qwen3 Omni quantized model

> [@wenhuach21](https://github.com/wenhuach21) [@allerou4](https://github.com/allerou4) [Qwen3-Omni Quantized](https://huggingface.co/cpatonn/Qwen3-Omni-30B-A3B-Instruct-AWQ-4bit) Here is the link for the Qwen3-Omni. But, I have found out that with auto-round pipelines are more stable than other mode of techniques. I...

[QUESTION] Qwen3 Omni VRAM memory leak

Hi, I encountered the same issue, I have four cards L20 with 48GB×4 and blow is the last couple of lines of the log, the distribution of memory usage is...

[QUESTION] Qwen3 Omni VRAM memory leak

> Hey [@Qubitium](https://github.com/Qubitium) any plans to have a look at this issue? 🙏🏼 Hi， I successfully did it, look at my code: ```python calibration_dataset = calibration_dataset.filter(lambda x: len(x["text"])

[QUESTION] Qwen3 Omni VRAM memory leak

> Indeed using much less samples and sequence length is a workaround. However, at the end of the run it wasn't able to save the weights from the offloaded tensors....

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

regarding the third one, solved by adding this config: """json "modules_in_block_to_quantize": [ "self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj", "self_attn.o_proj", "self_attn.qkv_proj", "mlp.gate" ], "packed_modules_mapping": { "self_attn.qkv_proj": [ "self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj" ] """ see: https://github.com/vllm-project/vllm/pull/25455#issuecomment-3343836290

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

> [@allerou4](https://github.com/allerou4) Can you check if it works without the packed config qkv_proj? vllm should be doing it's own qkv fusing and should need us to declare this. This part...

[BUG]Qwen3-Omni-30B-A3B saved quantized model can't be loaded by vllm

> Regarding bug 2/3: > > It has been fixed in the main branch code of vllm. > > https://github.com/vllm-project/vllm/pull/29896/files#diff-a65936ff683c1b4c8d7f3cdd49c28022f38d5e7cfbee857e7dc8c4f6731af0f9R1141-R1152 Hi, I think this pr only solve bug2 For bug3,...

QNN backend fails on second model load (DMA-BUF preregistration issue)

No more logs or QNN error codes?