secp256k1
secp256k1 copied to clipboard
Missing early clobber
In scalar_4x64_impl.h, function secp256k1_scalar_reduce_512. The first asm block is missing early clobber for m0~m2.
The input %rsi is used after writing to m0, m1 and m2. Any of those write can thus re-use %rsi (which will most likely result in segfault).
Also, field_5x52_asm_impl.h is missing similar early clobbers for tmp1~tmp3
My understanding of the GCC ASM extensions is very limited but I think you're right. Would you be willing to open a PR?
Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself. I actually found this bug because I was re-using code from the library in a small home project, where I have field elements not using nails (packed in 32 bytes, like the scalars). I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).
So, I had to modify the field code to look like the scalar code, but using modulo P instead of modulo N. Most code mainly just involved using P instead of N as a constant, but I had to re-write the asm for the 512-bit reduction (much simpler as you only have one 64-bit single-precision multiplication). However, at some point, I hit the issue with the clobber list as it was defined in scalar code (segfault). Looking at asm output, I saw that the compiler did put the 'extract to m2' to %rsi, and since %rsi was used as indexed memory source for the next instruction, this caused segfault. Just defining m0~m2 output list as early-clobber (since the %rsi input is not 'consumed' before those outputs are set) fixed the issue.
Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself.
Well, neither do I (or anyone else here). We haven't seen bugs due to this but I guess then we were just lucky that compilers got it right arbitrarily.
I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).
Can I ask what your project is about? I'm just curious, it's not related to this issue.