secp256k1 icon indicating copy to clipboard operation
secp256k1 copied to clipboard

General batch verification API context

Open siv2r opened this issue 2 years ago • 6 comments

context: https://github.com/bitcoin-core/secp256k1/pull/760#issuecomment-809242311

I am trying to implement a PoC for the API proposed above. I have the following batch_verify object in mind.

typedef struct {
    unsigned char chacha_seed[32];  /* for generating common randomizers (1, a2, a3 ... au) */
    secp256k1_scalar randomizer_cache[2];
    schnorrsig_batch_verify sigs_data;
    tweaked_key_batch_verify tweaked_keys_data;
} batch_verify_struct;

typdef struct {
    secp256k1_scratch *sigs_data; /* (sig, msg, pk) */
    size_t len; 
    size_t capacity; /* equals (sigs_data->max_size)/(64 + 32 + sizeof(secp256k1_xonly)) */
    int result;
} schnorrsig_batch_verify;

typdef struct {
    secp256k1_scratch *tweaks_data; /* (pairity, tweaked_key, tweak32, pubkey_key) */
    size_t len;
    size_t capacity;
    int result;
} tweaked_key_batch_verify;

I plan to use a scratch object to store the data (schnorrsig or tweaks) since it will allow us to keep on adding new data (using batch_add_sig and batch_add_xpubkey_tweak) and increase the batch object's size accordingly. This batch object doesn't seem compatible with ecmult_pippenger_batch or ecmult_strauss_batch function call.

Since both Pippenger and Strauss takes the arguments:

  • void *cbdata --> contains the required data
  • secp256k1_scratch *scratch --> newly allocated scratch space where scalars and points are loaded for multi multiplication

But this batch object already has the required data in a scratch space. Maybe use another scratch space for loading scalars and points? Won't this increase memory usage? Also, does this API require a new module? Or including these in the schnorrsig module suffice?

siv2r avatar Mar 13 '22 03:03 siv2r

This is a start. Ideally, the batch object does not hold signatures, messages and the likes. Instead, only scalars and points are stored on the batch object's scratch space. In order to avoid allocating space again for scalars and points in ecmult_strauss_batch and ecmult_pippenger_batch, we need to refactor these functions (and others) to be able to tell them that scalars and points already exist on the scratch space. For this idea (and your idea btw) to work, the scratch space provided to the batch object is exclusively owned by the batch object and must not be touched by the user anymore until batch verification is over. Another drawback is that we must represent points as secp256k1_gej because that's required by Strauss whereas secp256k1_ge would be sufficient for Pippenger.

We also need to keep in mind that we can not compute the Schnorr batch verification randomizer by hashing all signatures, public keys and messages as before. We just don't know all of them yet when secp256k1_batch_add_sig is called to add a single (randomized) scalar to the batch object's scratch space. A very simple approach (but not close to being optimally efficient) is to have a (tagged) secp256k1_sha object in the batch object that hashes everything that was seen so far. Everytime we need a fresh randomizer, a copy of the sha object is made and finalized. This approach of computing the randomizers only from input available so far would be covered by the proof here.

jonasnick avatar Mar 13 '22 22:03 jonasnick

@real-or-random pointed out to me that there is a simpler solution at the cost of requiring more memory. What I had assumed above is that we compute the randomizer immediately to allow storing only the sum of scalars that are multiplied with G (c.f. (s_1 + a_2⋅s_2 + ... + a_u⋅s_u)⋅G in BIP-340). If we instead store each individual such scalar, we can delay computing the randomizer to right before batch verifying.

We can do this by having the batch object store a secp256k1_sha object. Every time something is added to the batch, we write the input data (e.g. message, public key, signature per BIP-340 batch verification recommendation) into the sha object, but do not randomize the scalars yet. Only right before batch verifying, we finalize the sha object to obtain a hash for seeding the CSPRNG. Each scalar is then multiplied with a randomizer (the right randomizer to be precise)

jonasnick avatar Mar 14 '22 20:03 jonasnick

Hm yeah, right but then we'll need to store the scalars as you point out, and I'm not sure that's worth the hassle.

So we'll anyway need to keep some O(u) things:

  • The pubkeys P_i
  • The nonces r_i

Adding s_i will make be an increase of 50% and could mean that we'll support smaller batch sizes for a given memory.

On the other hand, if you have enough memory, this argument won't apply. Moreover, the caller may keep the s_i (and also the other inputs around anyway). With an API that would require the caller not to touch these until the computation has been finalized, we could save a lot of memory (and copying). But that API will be harder to use correctly. Now that I say this, maybe a "streaming" API is the actually wrong approach and it should be just a single call as in the BIP. That's simple and stateless.

real-or-random avatar Mar 17 '22 14:03 real-or-random

Sorry for the delay.

I took a look at scratch_alloc done by pippenger_batch and strauss_batch. A shared format for the scratch space (allocated with scalars and points), which the ecmult_mulit_var can pass to either pippenger_batch or strauss_batch (for multi multiplication), seems infeasible. I can think of two options now:

  1. refactor the scratch allocations done in pippenger_batch and strauss_batch to support a shared format.
  2. avoid a shared format for scratch space and use any one of pippenger_batch or strauss_batch for multi multiplication

Is option1 the right approach?


Batch object's Scratch Space Initialization: user calls void batch_verify_init(secp256k1_context ctx, size_t n_terms)

batch* batch_verify_init(size_t n_terms) {
     batch ret;
     scratch_size = strauss_scratch_size(n_terms) + STRAUSS_SCRATCH_OBJECT*16;
     ret.scratch = scratch_create(&ctx->error_callback, scratch_size);
     /* allocate space for n_terms (scalar, points) on scratch space*/ --> implementation info below
     /* other necessary batch obj allocations */
     return &ret;
}

Here, we create scratch memory required for n_terms Strauss points since it is always greater than the scratch memory required for n_terms Pippenger points. The ecmult benchmark used a similar approach (see here).

Allocating scratch memory for n_terms (scalar, points):

  • Format1: for supporting strauss_batch we need to do: (see here)
     /* both of these are impl  using scratch_alloc() */
    ret.scratch->points = scratch_alloc(n_terms * sizeof(secp256k1_gej));
    ret.scratch->scalars = scratch_alloc(n_terms * sizeof(secp256k1_scalar));
  • Format2: for supporting pippenger_batch we need to do: (see here)
    ret.scratch->points = scratch_alloc((2*n_terms + 2) * sizeof(secp256k1_ge));
    ret.scratch->scalars = scratch_alloc((2*n_terms + 2) * sizeof(secp256k1_scalar));

If we use format1, we can't call pippenger_batch; if we use format2, we can't call strauss_batch. This is the format issue that I was talking about earlier.

siv2r avatar Mar 26 '22 21:03 siv2r

If we use format1, we can't call pippenger_batch;

That's not true if the pippenger algorithm is refactored appropriately. The algorithm would know that n_terms * sizeof(secp256k1_gej) and n_terms * sizeof(secp256k1_scalar) are already on the scratch space. Therefore, it would only allocate (n_terms + 2) * sizeof(secp256k1_ge) and (n_terms + 2) * sizeof(secp256k1_scalar).

Besides, I like the idea that the batch object creates its own scratch space.

jonasnick avatar Mar 29 '22 16:03 jonasnick

@real-or-random

maybe a "streaming" API is the actually wrong approach

There's another advantage to having a single call instead of a streaming API. In general, developers want to know approximately how long a particular function call takes. With the "streaming" API one can not predict when a call to batch_add_* will be fast and when it will take much much longer.

and it should be just a single call as in the BIP

If there was a way to do this that allows multiple objects to be batch verified and is extensible this would be worth exploring. I just don't see how.

jonasnick avatar Apr 26 '22 20:04 jonasnick