easyaspi314 comments

Results 132 comments of


                                            easyaspi314

Refactor to improve code size, fix GCC 12

> is it possible to break this PR into smaller stages ? Sure, I can do that. Sorry I adhd'd again 😵‍💫 I think that the isolated changes that are...

Refactor to improve code size, fix GCC 12

The first thing I'll do is find the minimum amount of `#pragma GCC optimize("-O3")` or `__attribute__((optimize("-O3")))` to shut up GCC

Refactor to improve code size, fix GCC 12

I have a rough draft of the hashLong refactor that is designed to be dispatched, but `update()` performance is pretty bad due to the function call latency (since consumeStripes() would...

Refactor to improve code size, fix GCC 12

> As for code size, c6dc92f also reduces it and may implement more robust dispatch functionality. > I should go back and finish that. I kinda got distracted by another...

Refactor to improve code size, fix GCC 12

> > I should go back and finish that. I kinda got distracted by another project 😅 > > Ideally, I would like to publish release `v0.8.2` in the very...

Support SVE with assembly implementation

From a small amount of digging. c7g (AWS Graviton 3) seems to be based on the Neoverse V1, which is SVE-256. Looking at the [optimization guide](https://developer.arm.com/documentation/pjdoc466751330-9685/latest/): - Interleaving the two...

Support SVE with assembly implementation

> So your guess is that `SVE` performance on Graviton3 could have been better, > but it's a matter of correctly optimizing for this architecture (manually or via compiler). Yes....

Support SVE with assembly implementation

As for the interleaved SVE, try replacing the sve256 loop block (L294 to L306) with this. Disclaimer, I haven't tested this and my ordering might not be ideal. ```asm 10:...

Support SVE with assembly implementation

As a side note I wonder if it is beneficial to just use NEON on SVE-128.

Support SVE with assembly implementation

> > As for the interleaved SVE, try replacing the sve256 loop block (L294 to L306) with this. Disclaimer, I haven't tested this. > > Unfortunately, my access to this...