devops724
devops724
great, the only missing part is 4bit AWQ or guff version of 72b for 48gb vram devices and 30b-40b AWQ version for 24gb vram devices
thanks for response and code is there any guide how much VRAM and DRAM required for launch this model
> You can try [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) i can't find a way to compile this using cuda 12.8 in linux too, rtx 50 series require minimum cuda 12.8
> Could I take one this one ensure it work in bitsandbytes too,please, most low vram developers use 32b models in bitsandbytes
aya vision is in top 10 in huggingface trending models
same error here Aug 03 15:40:20 7GPU-AI bash[4699]: INFO 08-03 15:40:20 [async_llm.py:269] Added request chatcmpl-3c1d683bab4446159b4da8e73c95ec4d. Aug 03 15:40:21 7GPU-AI bash[5058]: [rank1]:[E803 15:40:21.173317791 ProcessGroupNCCL.cpp:1899] [PG ID 2 PG GUID 3 Rank...
thanks, this work now on awq version of model , but bitsandbytes quantization return this error ERROR 02-24 02:14:33 core.py:291] tracer.run() ERROR 02-24 02:14:33 core.py:291] File "/home/user/miniconda3/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in...
> The same question, but I am using qwen2.5-vl. qwen2.5-vl model already support in latest version