wolfssl icon indicating copy to clipboard operation
wolfssl copied to clipboard

SP_MATH for Secp256k1

Open coolhacker1337 opened this issue 2 years ago • 6 comments

Version

master

Description

Hi there! While profiling wolfcrypt library I observed that setting WOLFSSL_HAVE_SP_ECC makes function ecc_make_pub_ex use sp_ecc_mulmod_base_256 when curve is ECC_SECP256R1.

I am using x86_64 implementation of sp.h with assembly turned on.

Wondering if the same logic could be applied to ECC_SECP256K1 curve. I replaced ecc_sets[key->idx].id == ECC_SECP256R1 with (ecc_sets[key->idx].id == ECC_SECP256R1 || ecc_sets[key->idx].id == ECC_SECP256K1) manually everywhere, and generated private/public keypairs seem to be OK (signature and verification worked correctly). But still not sure why the logic of current master only handles ECC_SECP256R1 so far.

The reason behind this question is performance. For my input data switching to sp_ecc_mulmod_base_256 yields significant time performance boost compared to default Montgomery multiplication.

Thanks!

coolhacker1337 avatar Jul 13 '22 15:07 coolhacker1337

Hi @coolhacker1337

The optimised code for SECP256R1 will not work with SECP256K1. This is because the code assumes information about the SECP256R1 prime which is a different value to the SECP256K1 prime. It would require some work to include an implementation for SECP256K1 as well. If you would like me to put you in contact with a sales person, let me know.

Sean

SparkiDev avatar Jul 13 '22 22:07 SparkiDev

Hi Sean! Thanks for your work and response. Right, I have checked the codebase deeper, especially this file: https://raw.githubusercontent.com/wolfSSL/wolfssl/master/wolfcrypt/src/sp_x86_64.c

As far as I could get these are curve-specific constants: (Only viewable as raw, so can't use links here)

/* The modulus (prime) of the curve P256. */
static const sp_digit p256_mod[4] = {
    0xffffffffffffffffL,0x00000000ffffffffL,0x0000000000000000L,
    0xffffffff00000001L
};
/* The Montgomery normalizer for modulus of the curve P256. */
static const sp_digit p256_norm_mod[4] = {
    0x0000000000000001L,0xffffffff00000000L,0xffffffffffffffffL,
    0x00000000fffffffeL
};
/* The Montgomery multiplier for modulus of the curve P256. */
static const sp_digit p256_mp_mod = 0x0000000000000001;
#if defined(WOLFSSL_VALIDATE_ECC_KEYGEN) || defined(HAVE_ECC_SIGN) || \
                                            defined(HAVE_ECC_VERIFY)
/* The order of the curve P256. */
static const sp_digit p256_order[4] = {
    0xf3b9cac2fc632551L,0xbce6faada7179e84L,0xffffffffffffffffL,
    0xffffffff00000000L
};
#endif
/* The order of the curve P256 minus 2. */
static const sp_digit p256_order2[4] = {
    0xf3b9cac2fc63254fL,0xbce6faada7179e84L,0xffffffffffffffffL,
    0xffffffff00000000L
};
#if defined(HAVE_ECC_SIGN) || defined(HAVE_ECC_VERIFY)
/* The Montgomery normalizer for order of the curve P256. */
static const sp_digit p256_norm_order[4] = {
    0x0c46353d039cdaafL,0x4319055258e8617bL,0x0000000000000000L,
    0x00000000ffffffffL
};

L_sp256_mod_inv_avx2_4_order:
.long	0x632551,0x1e84f3b,0x3bce6fa,0x3ffffff
.long	0x3ff0000,0x0,0x0,0x0
.long	0x272b0bf,0x2b69c5e,0x3ffffff,0x3ff
.long	0x3fffff,0x0,0x0,0x0

Functions such as sp_256_mod_mul_norm_4 get the values above passed as arguments, in this case m: static int sp_256_mod_mul_norm_4(sp_digit* r, const sp_digit* a, const sp_digit* m)

I wonder if it is possible to define the same set of parameters for secp256k1 curve and generalise the code to support it? For instance, add curve_id argument here and elsewhere required, or write duplicate of sp_256_ecc_mulmod_base_4 for 256k1 curve:

static int sp_256_ecc_mulmod_base_4(sp_point_256* r, const sp_digit* k,
        int map, int ct, void* heap)
{
    return sp_256_ecc_mulmod_stripe_4(r, &p256_base, p256_table,
                                      k, map, ct, heap);
}

I am not proficient in cryptography, but it looks like the core algrebraic algorithms should support curves different from p256/384/1024. What do you think, would it be good idea to contribute to the library by introducing SECP256K1 sp_x86_64 support? I'd be happy to do that.

Thanks, Nikolay

coolhacker1337 avatar Jul 14 '22 19:07 coolhacker1337

Hi @coolhacker1337

There are some functions that take the prime as a parameter and some implementations simply ignore it and assume the value is for P256.

If you look at sp_x86_64_asm.S, you will see functions like sp_256_mont_[mul|sqr|add|dbl|sub|tpl|reduce]_4() that are implemented assuming P256.

Also, in sp_x86_64.c, the function sp_256_mont_inv_4 assumed the prime is that for P256 and the function sp_256_mont_inv_order_4 assumes the order for P256.

There are a number of places that need to be rewritten.

If we have customers that need this then we will consider investing in this area.

Sean

SparkiDev avatar Jul 14 '22 22:07 SparkiDev

Hi Sean, thanks for your response! I have started working on secp256k1 curve. So far I have understood that Montgomery reduction is written for p256 curve, and the most of the job is to rewrite it for secp256k1 (both AXI/non-AXI versions). I found references and literature to work on this productively. The only place in the codebase which I am still struggling to figure out is this piece of code in sp_256_mod_mul_norm_4 of sp_x86_64.c.

    /*  1  1  0 -1 -1 -1 -1  0 */
    t[0] = 0 + a32[0] + a32[1] - a32[3] - a32[4] - a32[5] - a32[6];
    /*  0  1  1  0 -1 -1 -1 -1 */
    t[1] = 0 + a32[1] + a32[2] - a32[4] - a32[5] - a32[6] - a32[7];
    /*  0  0  1  1  0 -1 -1 -1 */
    t[2] = 0 + a32[2] + a32[3] - a32[5] - a32[6] - a32[7];
    /* -1 -1  0  2  2  1  0 -1 */
    t[3] = 0 - a32[0] - a32[1] + 2 * a32[3] + 2 * a32[4] + a32[5] - a32[7];
    /*  0 -1 -1  0  2  2  1  0 */
    t[4] = 0 - a32[1] - a32[2] + 2 * a32[4] + 2 * a32[5] + a32[6];
    /*  0  0 -1 -1  0  2  2  1 */
    t[5] = 0 - a32[2] - a32[3] + 2 * a32[5] + 2 * a32[6] + a32[7];
    /* -1 -1  0  0  0  1  3  2 */
    t[6] = 0 - a32[0] - a32[1] + a32[5] + 3 * a32[6] + 2 * a32[7];
    /*  1  0 -1 -1 -1 -1  0  3 */
    t[7] = 0 + a32[0] - a32[2] - a32[3] - a32[4] - a32[5] + 3 * a32[7];

I guess the coefficients in the comments are somehow related to prime inverse, and the operation is optimised inner multiplication, but not sure. There are no such examples anywhere else, so I'd be really grateful if you could explain where they come from.

Thanks!

coolhacker1337 avatar Jul 19 '22 15:07 coolhacker1337

Hi @coolhacker1337

In its simplest form, the mod_mul_norm operation is: (a * norm_p256k1) mod prime_p256k1 norm = (1 << 256) - prime_p256k1 = 0x3d1

So a fast implementation would: mul_d(t, a, p256k1_norm); add_d(r, t, p256k1_norm * t[4]); // mod prime_p256k1, t[4] is overflow from mul_d (where d means digit, add_d does not exist in code base right now)

Sean

SparkiDev avatar Jul 19 '22 22:07 SparkiDev

Hi @coolhacker1337,

It has been sometime since your last message. Do you have any more questions around this?

Thanks, Sean

SparkiDev avatar Sep 19 '22 23:09 SparkiDev

Hi @coolhacker1337,

It has been sometime since your last message. Do you have any more questions around this or will I close the ticket?

Thanks, Sean

SparkiDev avatar Nov 08 '22 00:11 SparkiDev

Hi @coolhacker1337

I'm closing the ticket due to inactivity.

Sean

SparkiDev avatar Dec 12 '22 22:12 SparkiDev