devops724 comments

Results 18 comments of


                                            devops724

is there any quantized models for Qwen2.5-VL

great, the only missing part is 4bit AWQ or guff version of 72b for 48gb vram devices and 30b-40b AWQ version for 24gb vram devices

[Feature] support Qwen3-235B-A22B-Instruct-2507

thanks for response and code is there any guide how much VRAM and DRAM required for launch this model

is this project dead?

> You can try [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) i can't find a way to compile this using cuda 12.8 in linux too, rtx 50 series require minimum cuda 12.8

[New Model]: aya 32b vision support

> Could I take one this one ensure it work in bitsandbytes too,please, most low vram developers use 32b models in bitsandbytes

[New Model]: aya 32b vision support

aya vision is in top 10 in huggingface trending models

[Bug]: vLLM engine crashes then restarts and loads the model on sleep if a chat request is made

same error here Aug 03 15:40:20 7GPU-AI bash[4699]: INFO 08-03 15:40:20 [async_llm.py:269] Added request chatcmpl-3c1d683bab4446159b4da8e73c95ec4d. Aug 03 15:40:21 7GPU-AI bash[5058]: [rank1]:[E803 15:40:21.173317791 ProcessGroupNCCL.cpp:1899] [PG ID 2 PG GUID 3 Rank...

[Bug]: v0.7.3 upgrade issue,

thanks, this work now on awq version of model , but bitsandbytes quantization return this error ERROR 02-24 02:14:33 core.py:291] tracer.run() ERROR 02-24 02:14:33 core.py:291] File "/home/user/miniconda3/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in...

[New Model]: support Ovis VLM series

> The same question, but I am using qwen2.5-vl. qwen2.5-vl model already support in latest version