Andy Polyakov
Andy Polyakov
> Intel AVX2 processors customarily have 3 full-width 256-bit units, 50% more compute power in comparison, so 0.6 vs. 0.4 is totally expected. Though Arrow Lake has 4 256-bit units,...
> > AVX512 [implementation] on the other hand utilizes larger radix, which means that it takes smaller amount of operations, so it's apples vs. oranges. > > Do you think...
Sigh... I mean it's so many variables... But briefly. On ARM you have to consider that there are in-order execution cores. ARM processors also tend to have higher vector instruction...
> you start from IN01_2, not IN01_0 The reason is not IN01, but H. The goal was to consume Hn-s in the order they are calculated with intention to provide...
> Neon->SVE2 porting of the 2-way (i.e. 128-bit) vectorisation Does it mean that the implementation won't automatically adjust to wider registers? If so, then what's the point? It's not like...
> Would it be OK if I recommend you as a reviewer when I PR to OpenSSL btw? I can't prevent anybody from tagging me, but I won't make any...
In https://github.com/dot-asm/cryptogams/issues/17#issuecomment-3053181340 I've suggested reading the manual, but you chose to google things :-) You can't google things that were never done! SVE (yes, pre-SVE2) does have multiplication instructions that...
> "blueprint" for SVE/2 implementation that could be extended to say 256-bit vector registers, No, the suggestion is to make it actually width-agnostic. Well, it's unrealistic to make it literally...
> I am sorry, And where is smiley? I mean we're just kidding around with regard to reading the manual, right? > I still need to wrap my head around...
> > mul and umulh are specified with .d qualifier, so you have 64-bit widening multiplication. One can build either base 2^64 or base 2^44. Again, not saying that it's...