Stephan Walter
Stephan Walter
@anzz1 Thanks for your comments. > However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having...
I have resolved the conflicts and looked over the changes again. I added a test for SILU, but I have disabled the test module to avoid long CI times and...
#1405 removed the vzips
@ggerganov did you want to look at this again or can we merge it?
This seems to work fine here on Linux, but it's not really needed here as we have `-march=native`. I don't have Windows to test, so someone else should probably have...
Maybe a better solution would be to have a LLAMA_PORTABLE option in cmake. If you set PORTABLE=OFF: * on gcc it will set `-march=native` * on MSVC it will run...
As for `ggml_extra.cpp`, is that the same approach as @unbounded tried here: https://github.com/ggerganov/llama.cpp/issues/397#issuecomment-1500718744 ? Anyway I'll look into it... (edit: looks like your RMS errors are higher than those by...
I used this for 2-bit quantization, where it did make a big difference (after all, it lets you use 4 instead of 3 values). For 4-bit the effect is less...
Thanks to #728, I was able to test my q2 and q3 implementations, and as expected the changes are bigger with fewer bits: ``` [-7,+7] q2_0 : rmse 0.01329486, maxerr...
> The increase in maximum error is probably due to the case where there are two values of similar magnitude but opposite signs. The value closer to zero gets rounded...