Awni Hannun
Awni Hannun
@antirez do you happen to know where to get more details on the meaning of the quantization formats? It's quite difficult to interpret these acronyms in the GGUF docs: ```...
> There you will find very detailed explanations of the exact data layout. This is exactly what I am looking for thank you 🙏
> Would it make sense to introduce new dtypes for different quantization algorithms? We thought about this when implementing our quantization and decided against it. That level of type proliferation...
> I'm wondering if we should just disable quantization in hf_llm so that later in lora, we can directly use the model converted by hf_llm without this kind of issue?...
> Qlora works with the quantized lm_head, but the issue is to merge it back into the original model. The model's performance is a lot worse after dequantizing the lm_head....
Ok I think it's clear now, but just to be sure: 1. Download https://huggingface.co/mlx-community/Mistral-7B-v0.1-hf-4bit-mlx 2. Update the LoRA script to be able to load the quantized lm_head (as [you've done...
I think this is closed as of #250 since the LM layer is now quantized by default. I found in #252 that if you do qlora then fusing and keeping...
@menzHSE did you record the "Previous" results on the same machine? In general it's expected that there can be considerable variability between types of hardware.
Well that is very strange... let me time it on my machine..
I am on main right now with a 32GB M1 Max. Clearly that seems to help a lot... I don't have access to a 16GB M1 pro, but I will...