Robert Muir comments

Results 269 comments of


                                            Robert Muir

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

Do we even need to use intrinsics? function is so simple that the compiler seems to do the right thing, e.g. use `SDOT` dot production instruction, given the correct flags:...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

I haven't benchmarked, just seems `SDOT` is the one to optimize for, and GCC can both recognize the code shape and autovectorize to it without hassle. my cheap 2021 phone...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

> With the updated compile flags, the performance of auto-vectorized code is slightly better than explicitly vectorized code (see results). Interesting thing to note is that both C-based implementations have...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

> I avoided it at the time given the toolchain that we were using, but it's a good option which I'll reevaluate. It should work well with any modern gcc...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

Here is my proposal visually: https://godbolt.org/z/6fcjPWojf As you can see, by passing `-march=cascadelake` it generates VNNI instructions. IMO, no need for any intrinsics anywhere, for x86 nor ARM. Just a...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

And i see from playing around with compiler versions, the advantage of intrinsics approach: although I worry how many variants we'd maintain. it would give stability across releasing lucene without...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

I definitely want to play around more with @goankur 's PR here and see what performance looks like across machines, but will be out of town for a bit. There...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

go @goankur, awesome progress here. It is clear we gotta do something :) I left comments just to try to help. Do you mind avoiding rebase for updates? I am...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

Attached is a patch to get x86 support working. It makes some changes to the build: specifically the java code statically picks the best MethodHandle (SVE, Neon, Generic), and its...

New JMH benchmark method - vdot8s that implement int8 dotProduct in C…

TODO: need to examine avx256 difference of auto-vectorized C with vs java vector api for the integers here. This isn't nearly as bad as the ARM case (where we understand...