Stephan Walter comments

Results 99 comments of


                                            Stephan Walter

Be more strict about converting float to double

@anzz1 Thanks for your comments. > However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having...

Be more strict about converting float to double

I have resolved the conflicts and looked over the changes again. I added a test for SILU, but I have disabled the test module to avoid long CI times and...

make issue on sbc odroid

#1405 removed the vzips

Refactor quantized processing functions

@ggerganov did you want to look at this again or can we merge it?

Add detection code for avx/avx2/etc

This seems to work fine here on Linux, but it's not really needed here as we have `-march=native`. I don't have Windows to test, so someone else should probably have...

Add detection code for avx/avx2/etc

Maybe a better solution would be to have a LLAMA_PORTABLE option in cmake. If you set PORTABLE=OFF: * on gcc it will set `-march=native` * on MSVC it will run...

More accurate Q4_0 and Q4_1 quantizations

As for `ggml_extra.cpp`, is that the same approach as @unbounded tried here: https://github.com/ggerganov/llama.cpp/issues/397#issuecomment-1500718744 ? Anyway I'll look into it... (edit: looks like your RMS errors are higher than those by...

Use full range for q4_0 quantization

I used this for 2-bit quantization, where it did make a big difference (after all, it lets you use 4 instead of 3 values). For 4-bit the effect is less...

Use full range for q4_0 quantization

Thanks to #728, I was able to test my q2 and q3 implementations, and as expected the changes are bigger with fewer bits: ``` [-7,+7] q2_0 : rmse 0.01329486, maxerr...

Use full range for q4_0 quantization

> The increase in maximum error is probably due to the case where there are two values of similar magnitude but opposite signs. The value closer to zero gets rounded...