Does AirLLM Support Running Quantized Models (e.g., unsloth/Qwen2-72B-bnb-4bit)?

Open NEWbie0709 opened this issue 10 months ago • 1 comments

Does AirLLM currently support running 4-bit quantized models like unsloth/Qwen2-72B-bnb-4bit? I’m trying to load and run this model using AirLLM, but I’m encountering the following error during generation:

RuntimeError: Attempted to call variable.set_data(tensor), but variable and tensor have incompatible tensor type.

Other than that, I also tried using the smaller version of Qwen, for example: Qwen/Qwen2.5-0.5B, but I encountered this error.

AssertionError: model.safetensors.index.json should exist

Feb 24 '25 04:02 NEWbie0709

i tried running with Qwen-72B-instruct and this is the error i got

Feb 26 '25 07:02 NEWbie0709