Andrew Moon
Andrew Moon
The state shouldn't have an alignment requirement, so it's an overlook in the avx2 (and xop) implementation. The test state should be intentionally misaligned to catch this. The j array.....
RE: array loads, I realized you could use a fully packed table for the scalarmult_base lookup (the 24k table used by amd64-64-24k), and expand ysubx, xaddy, and t2d at the...
cmov in one pass appears to be a bit slower than sse2 registers in one pass
pushed all my latest stuff! and renewed (whoops), but now the webserver has broken. I should probably check it a little more often and do something with it.
Can you add the data for the 3 signatures so I can take a look at what's happening? There is no limit on the size of the batch. Internally the...
was actually working on that! I wanted to do the sse2 versions at the least, but wasn't sure about the usefulness of the base stuff, partly due to not having...
I've been investigating what is faster about the amd64-51/64 implementations and what I could incorporate in, and may have gotten speed-competitive-or-better with them on amd64 (mostly due to cutting more...
New code up, need to get the SSE2 stuff updated and then fixed up for SUPERCOP
Included amd64-64-24k and amd64-51-32k, and benches on a Sandy Bridge CPU. Need to work on updating the SSE2 code now
It is kind of a lazy preference to not serialize with cpuid. If I'm calling it extremely frequently, I'm willing to live with some jitter instead of taking the overhead...