Ivan Komarov
Ivan Komarov
Today, an "official" microbenchmark for `ggml_vec_dot_q4_0()` was introduced in 95ea26f6e92d620a5437f576b80868aee7f808d6. It seems to confirm the speed-up (measured on the same Tiger Lake laptop). `make benchmark` output for 0e07e6a8399fd993739a3ba3c6f95f92bfab6f58 (the "old"...
FWIW, I didn't intend to close the PR. I merged the latest upstream commits to run the microbenchmark and accidentally force-pushed 0e07e6a8399fd993739a3ba3c6f95f92bfab6f58 (the latest upstream master I used for microbenchmarking)...
@KASR Regarding `AVX512-VBMI`: I don't think you're doing anything wrong or stupid, it's probably just that your CPU is Cascade Lake, which doesn't support VBMI (according to the "Microarchitecture support"...
> It does make me think we are spending a lot of extra effort shuffling values around because of the memory layout. How fast could this be if we only...
Here are the benchmark results so far, summarized (the value is the average of `FLOPS_per_u_Second` from 10 iterations of test 2 in `benchmark-q4_0-matmult`): |Who | CPU | CPU Family |...
Fixed a trivial merge conflict after 0ad964631f9b3970f1936008fcfb1eadef59c7ed. Otherwise, nothing has changed. For posterity's sake: one thing really bothering me in this PR is that I can't use the [_mm*_sign_ps() trick](https://github.com/ggerganov/llama.cpp/blob/master/ggml.c#L2260)...
@ultoris > Test on Raptor Lake fails as the Intel has dropped support for AVX512 in latest generations of consumer CPUs. It supports avx_vnni though (256bit instructions instead of 512bit...
> "make benchmark" on dfyz:master not working, its no make target Huh, this is strange. The head of `dfyz:master` was 4f46a1342ae124a4a756e98f3447c5cfdd52a2cb when I was writing that comment, and it [did...
@dniku Oh my god, this proved to be a *deep* rabbit hole. I was able to reproduce this with a `Galaxy Z Flip4` (which uses `Snapdragon 8+ Gen 1`) and...
@aicoat Did you manage to resolve you problem? A52 seems to use an older Kryo core based on Cortex-A76, so the problem you were seeing might have been unrelated to...