Salvatore Sanfilippo
Salvatore Sanfilippo
Could not test (I have yet to generate meaningful GGUF files for testing), but here it is -> https://github.com/antirez/gguf-tools/commit/eec3dc9f544e818c39ba3f3500c7a7e8eb6e29c9
I tried to change the code so that in theory, in the future we could add methods to dequantize directly to formats supported by MLX or other frameworks. There is...
> PS: If I'm reading [this](https://github.com/ggerganov/llama.cpp/blob/468ea24fb4633a0d681f7ac84089566c1c6190cb/ggml.c#L1525-L1565) correctly, Q4_0 and Q4_1 are also compatible (Q4_0 has scales and no biases, Q4_1 has both). Looks like that, I didn't yet implemented those,...
> This is great, @antirez ! Thank you. With the callback, we can easily cast to bfloat16, which may be a better default for MLX. Sure! I'll try to add...
Hello and thank you for submitting this patch. 10-15% during pipelining is a remarkable result, so the patch will be reviewed for inclusion even at the cost of adding some...
Sorry yet not but will do in the next couple of days. Thanks and sorry for the delay!
@xiaorongxie thanks, can you describe the race condition here?
Hello, I'm here just to say that I believe this is a fundamental PR for the future of MicroPython development. In my experience many MicroPython projects are memory constrained and...
@dpgeorge do I understand correctly that we don't need any stability in the filesystem/format used, for this PR to work well? Because I guess that there will be a set...
Hi @madolson, this PR is paused since I was trying to understand if we should go directly to the approach you suggested, that is not having MIGRATE threaded but the...