AlpinDale

Results 170 comments of AlpinDale

This might end up being an issue that I'll need to discuss upstream with the vLLM team. The mixtral_quant modeling code there doesn't use the FusedMoE implementation, and does expert...

Yes, this is definitely doable. I'll work on this as soon as I have the bandwidth.

fpX quants currently enforce eager mode, due to a bug in their kernels. This will be addressed soon, but maybe we should log this behaviour. Thanks for reporting.

This issue has been fixed on the latest main. There are no releases for it yet (and likely won't be on PyPi for a while, due to their wheel size...

> > due to their wheel size restrictions > > They made an exception for pytorch [pypa/packaging-problems#96](https://github.com/pypa/packaging-problems/issues/96) We requested an extension before at pypi/support#4036, and it was approved. But since...

We do support Qwen2.5. It's not listed in the supported models list because it uses the same architecture as Qwen2: `Qwen2ForCausalLM`. See here https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/495f39366efef23836d0cfae4fbe635880d2be31/config.json#L3 We don't support Qwen2.5-Vision yet, because...

This seems to be an issue with the quantized model, looks like one of (or all) the layers doesn't have a config defined for it. Maybe @wejoncy has an idea?