CGBN
CGBN copied to clipboard
`cgbn_mont_sqr(r, a, modulus)` returns result > modulus
I have found a case with N=971 bits where cgbn_mont_sqr(r, a, n)
returns (a^2 % n) + n
I made a minimal repro branch here
Would you be willing to assist in debugging? Any ideas what might be wrong?
I increased the reproducibility by setting
a = n - bn2mont(1)
With BITS=1024 TPI={8,16,32}
, this fails for all 2^n-1, n >= 683
With BITS=512 TPI={8,16}
, this fails for all 2^n-1, n>=342
With BITS=256 TPI=8
, this fails for all 2^n-1 >= 171
This is probably n = 2/3 BITS
With BITS=512, TPI=32
, seems to work on the tested inputs
With BITS=256, TPI={16,32}
, seems to work on the tested inputs
With BITS=128, TPI={8,16,32}
, seems to work on the tested inputs
This error is present if I use core_mul_xmad.cu
or core_mul_imad.cu
I looked at testing this with unit_test but I found that experience hard. Is it expected that make takes ~90 seconds for tester? Is there a faster way to write a single unit test that tests cpu(mpz) vs gpu(CGBN)?
Hi Seth,
Thanks for reporting. You have found a problem here, but it's more of a documentation issue than a coding issue. The mont_mul and mont_sqr routines use a redundant representation, and only subtract N if there was a carry out -- this is faster because it saves a comparison on each mont_sqr and mont_mul. The mont2bn routine is the final step and ensures that the result is normalized to N. I think the implementation is right, but it should be documented better about what's going on under the covers.
I'll leave this issue open until I update the docs.
Thank you, Niall