Mikhail Ablakatov comments

Results 16 comments of


                                            Mikhail Ablakatov

JIT ARM64-SVE: Add Sve.CreateFalseMask*()

@kunalspathak @dotnet/arm64-contrib @a74nh

8322770: Implement C2 VectorizedHashCode on AArch64

/covered

8322770: Implement C2 VectorizedHashCode on AArch64

> Why are you adding across lanes every time around the loop? You could maintain all of the lanes and then merge the lanes in the tail. @theRealAph , thank...

8322770: Implement C2 VectorizedHashCode on AArch64

>> I can re-check and post the performance numbers here per a request. > Please do. Please also post the code. @theRealAph , you may find the performance numbers and...

8322770: Implement C2 VectorizedHashCode on AArch64

> You only need one load, add, and multiply per iteration. > You don't need to add across columns until the end. > > This is an example of how...

8322770: Implement C2 VectorizedHashCode on AArch64

> A high-performance AArch64 implementation can issue four multiply-accumulate vector instructions per cycle, with a 3-clock latency. @theRealAph , hmph, could you elaborate on what spec you refer to here?

8322770: Implement C2 VectorizedHashCode on AArch64

> You only need one load, add, and multiply per iteration. > You don't need to add across columns until the end. @theRealAph , I've tried to follow the suggested...

8322770: Implement C2 VectorizedHashCode on AArch64

Hi @theRealAph , following your suggestions I've got this working for ints and can confirm that it improves the performance. I don't have enough time at the moment to finish...

8322770: Implement C2 VectorizedHashCode on AArch64

Just as a note to not miss it later: the implementation might be affected by https://bugs.openjdk.org/browse/JDK-8139457

8322770: Implement C2 VectorizedHashCode on AArch64

I'm finishing up a patch, hopefully I'll push it later today.