secp256k1 icon indicating copy to clipboard operation
secp256k1 copied to clipboard

Missing early clobber

Open ehoffman2 opened this issue 3 years ago • 3 comments

In scalar_4x64_impl.h, function secp256k1_scalar_reduce_512. The first asm block is missing early clobber for m0~m2.

The input %rsi is used after writing to m0, m1 and m2. Any of those write can thus re-use %rsi (which will most likely result in segfault).

Also, field_5x52_asm_impl.h is missing similar early clobbers for tmp1~tmp3

ehoffman2 avatar Jul 14 '20 17:07 ehoffman2

My understanding of the GCC ASM extensions is very limited but I think you're right. Would you be willing to open a PR?

real-or-random avatar Jul 14 '20 22:07 real-or-random

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself. I actually found this bug because I was re-using code from the library in a small home project, where I have field elements not using nails (packed in 32 bytes, like the scalars). I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

So, I had to modify the field code to look like the scalar code, but using modulo P instead of modulo N. Most code mainly just involved using P instead of N as a constant, but I had to re-write the asm for the 512-bit reduction (much simpler as you only have one 64-bit single-precision multiplication). However, at some point, I hit the issue with the clobber list as it was defined in scalar code (segfault). Looking at asm output, I saw that the compiler did put the 'extract to m2' to %rsi, and since %rsi was used as indexed memory source for the next instruction, this caused segfault. Just defining m0~m2 output list as early-clobber (since the %rsi input is not 'consumed' before those outputs are set) fixed the issue.

ehoffman2 avatar Jul 15 '20 02:07 ehoffman2

Actually, I was not using the code as is, so I don't have the setup to test the issue in the current code itself.

Well, neither do I (or anyone else here). We haven't seen bugs due to this but I guess then we were just lucky that compilers got it right arbitrarily.

I do this in my project because I mainly want to use the GPU to do some computations and it help using packed binary representation (I use secp256k1 code mainly as a 'helper', the bulk of the computation will be GPU).

Can I ask what your project is about? I'm just curious, it's not related to this issue.

real-or-random avatar Jul 15 '20 14:07 real-or-random