ManniX-ITA

Results 87 comments of ManniX-ITA

They will be supported in the future, not sure when. There's not a huge interest because the i-matrix quants are sensibly slower during inference. And it takes a lot of...

To be honest anything below Q4 is poor quality, better to pick a smaller model. There are other formats better suited for 2/3 bit than GGUF with 3 bit very...

> * As far as I know, IQ quants are not the same thing as i-matrix quants, which can apply to any of the other quants, like K quants. I...

> I'm done arguing with you, "for obvious reasons." I'm done too arguing, there's really no obvious reason why you should attack me or defend @sammcj... Weird! But thanks for...

Made a PR to support the latest IQ formats: https://github.com/ollama/ollama/pull/3657 **IQ4_NL is now fixed.** They work pretty nice for me but only on the GPU. Definitely not recommended running on...

The enum order doesn't matter, the type is being checked over the tensors `t.Kind`. And it didn't mess up my massive library so don't worry :P ```go func (t Tensor)...

> So it's definitely not stored anywhere in Ollama's metadata files (that was my main worry)? Definitely not, the file is parsed every time it's loaded.

> Like I said, I defended _his point_. Thanks for the PR. Are you giving up on IQ4_NL? Should someone else look into it? Let it go, I don't mind...

I have updated the PR to fix IQ4_NL support, I will add the benchmark to the table above