Samuel Neves
Samuel Neves
This appears to have been fixed by 5e6def5b8162943e54f097ce29b188e64803a9ee
There was [this](https://datatracker.ietf.org/doc/draft-wconner-blake2sigs/), I'm not sure if that helps.
Not sure what you mean. EdDSA (with Ed448) with BLAKE2b-512 is defined there as `1.3.6.1.4.1.1722.12.5.5`, and ECDSA with BLAKE2b-512 as `1.3.6.1.4.1.1722.12.5.4`. There doesn't seem to be an OID for Ed25519-BLAKE2b,...
This is entirely dependent on architecture. I imagine you measured this on Skylake or Skylake-X. Also cycle counts are more useful than percentages to understand the difference. `[v]ps{r,l}l{d, q}` used...
Those are all questions without definitive answers. If you don't mind the maintenance, having a version for each major microarchitecture would be the best solution. But generally the solution that...
So this performance optimization looks like more of a LLVM "bug" than an actual optimization. In fact, I believe I had already seen this behavior before somewhere, and then forgot...
The `_mm256_shuffle_bytes` intrinsic appears to be decomposed into a [general vector shuffle](https://github.com/llvm/llvm-project/blob/9cdcd81d3f2e9c1c9ae1e054e24668d46bc08bfb/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L921-L965), which then gets pattern matched as a 16-lane 16-bit element shuffle, and [the general-purpose method is matched before...
I agree that this should happen at some point, I've just been occupied with other things.. Ideally getting all these fast implementations into OpenSSL would be the way to go,...
Variables in registers makes sense, yeah. Unclear to me what volatile access to those would mean. It's odd, though, that the effect does not happen with `x[3]`, which is also...