secp256k1
secp256k1 copied to clipboard
General batch verification API context
context: https://github.com/bitcoin-core/secp256k1/pull/760#issuecomment-809242311
I am trying to implement a PoC for the API proposed above. I have the following batch_verify
object in mind.
typedef struct {
unsigned char chacha_seed[32]; /* for generating common randomizers (1, a2, a3 ... au) */
secp256k1_scalar randomizer_cache[2];
schnorrsig_batch_verify sigs_data;
tweaked_key_batch_verify tweaked_keys_data;
} batch_verify_struct;
typdef struct {
secp256k1_scratch *sigs_data; /* (sig, msg, pk) */
size_t len;
size_t capacity; /* equals (sigs_data->max_size)/(64 + 32 + sizeof(secp256k1_xonly)) */
int result;
} schnorrsig_batch_verify;
typdef struct {
secp256k1_scratch *tweaks_data; /* (pairity, tweaked_key, tweak32, pubkey_key) */
size_t len;
size_t capacity;
int result;
} tweaked_key_batch_verify;
I plan to use a scratch object to store the data (schnorrsig or tweaks) since it will allow us to keep on adding new data (using batch_add_sig
and batch_add_xpubkey_tweak
) and increase the batch object's size accordingly. This batch object doesn't seem compatible with ecmult_pippenger_batch
or ecmult_strauss_batch
function call.
Since both Pippenger and Strauss takes the arguments:
-
void *cbdata
--> contains the required data -
secp256k1_scratch *scratch
--> newly allocated scratch space where scalars and points are loaded for multi multiplication
But this batch object already has the required data in a scratch space. Maybe use another scratch space for loading scalars and points? Won't this increase memory usage?
Also, does this API require a new module? Or including these in the schnorrsig
module suffice?
This is a start. Ideally, the batch object does not hold signatures, messages and the likes. Instead, only scalars and points are stored on the batch object's scratch space. In order to avoid allocating space again for scalars and points in ecmult_strauss_batch
and ecmult_pippenger_batch
, we need to refactor these functions (and others) to be able to tell them that scalars and points already exist on the scratch space. For this idea (and your idea btw) to work, the scratch space provided to the batch object is exclusively owned by the batch object and must not be touched by the user anymore until batch verification is over. Another drawback is that we must represent points as secp256k1_gej
because that's required by Strauss whereas secp256k1_ge
would be sufficient for Pippenger.
We also need to keep in mind that we can not compute the Schnorr batch verification randomizer by hashing all signatures, public keys and messages as before. We just don't know all of them yet when secp256k1_batch_add_sig
is called to add a single (randomized) scalar to the batch
object's scratch space. A very simple approach (but not close to being optimally efficient) is to have a (tagged) secp256k1_sha
object in the batch object that hashes everything that was seen so far. Everytime we need a fresh randomizer, a copy of the sha object is made and finalized. This approach of computing the randomizers only from input available so far would be covered by the proof here.
@real-or-random pointed out to me that there is a simpler solution at the cost of requiring more memory. What I had assumed above is that we compute the randomizer immediately to allow storing only the sum of scalars that are multiplied with G (c.f. (s_1 + a_2⋅s_2 + ... + a_u⋅s_u)⋅G
in BIP-340). If we instead store each individual such scalar, we can delay computing the randomizer to right before batch verifying.
We can do this by having the batch object store a secp256k1_sha
object. Every time something is added to the batch, we write the input data (e.g. message, public key, signature per BIP-340 batch verification recommendation) into the sha
object, but do not randomize the scalars yet. Only right before batch verifying, we finalize the sha
object to obtain a hash for seeding the CSPRNG. Each scalar is then multiplied with a randomizer (the right randomizer to be precise)
Hm yeah, right but then we'll need to store the scalars as you point out, and I'm not sure that's worth the hassle.
So we'll anyway need to keep some O(u) things:
- The pubkeys
P_i
- The nonces
r_i
Adding s_i
will make be an increase of 50% and could mean that we'll support smaller batch sizes for a given memory.
On the other hand, if you have enough memory, this argument won't apply. Moreover, the caller may keep the s_i
(and also the other inputs around anyway). With an API that would require the caller not to touch these until the computation has been finalized, we could save a lot of memory (and copying). But that API will be harder to use correctly. Now that I say this, maybe a "streaming" API is the actually wrong approach and it should be just a single call as in the BIP. That's simple and stateless.
Sorry for the delay.
I took a look at scratch_alloc
done by pippenger_batch
and strauss_batch
. A shared format for the scratch space (allocated with scalars and points), which the ecmult_mulit_var
can pass to either pippenger_batch
or strauss_batch
(for multi multiplication), seems infeasible. I can think of two options now:
- refactor the scratch allocations done in
pippenger_batch
andstrauss_batch
to support a shared format. - avoid a shared format for scratch space and use any one of
pippenger_batch
orstrauss_batch
for multi multiplication
Is option1 the right approach?
Batch object's Scratch Space Initialization:
user calls void batch_verify_init(secp256k1_context ctx, size_t n_terms)
batch* batch_verify_init(size_t n_terms) {
batch ret;
scratch_size = strauss_scratch_size(n_terms) + STRAUSS_SCRATCH_OBJECT*16;
ret.scratch = scratch_create(&ctx->error_callback, scratch_size);
/* allocate space for n_terms (scalar, points) on scratch space*/ --> implementation info below
/* other necessary batch obj allocations */
return &ret;
}
Here, we create scratch memory required for n_terms
Strauss points since it is always greater than the scratch memory required for n_terms
Pippenger points. The ecmult benchmark used a similar approach (see here).
Allocating scratch memory for n_terms
(scalar, points):
-
Format1: for supporting
strauss_batch
we need to do: (see here)
/* both of these are impl using scratch_alloc() */
ret.scratch->points = scratch_alloc(n_terms * sizeof(secp256k1_gej));
ret.scratch->scalars = scratch_alloc(n_terms * sizeof(secp256k1_scalar));
-
Format2: for supporting
pippenger_batch
we need to do: (see here)
ret.scratch->points = scratch_alloc((2*n_terms + 2) * sizeof(secp256k1_ge));
ret.scratch->scalars = scratch_alloc((2*n_terms + 2) * sizeof(secp256k1_scalar));
If we use format1, we can't call pippenger_batch
; if we use format2, we can't call strauss_batch
. This is the format issue that I was talking about earlier.
If we use format1, we can't call pippenger_batch;
That's not true if the pippenger algorithm is refactored appropriately. The algorithm would know that n_terms * sizeof(secp256k1_gej)
and n_terms * sizeof(secp256k1_scalar)
are already on the scratch space. Therefore, it would only allocate (n_terms + 2) * sizeof(secp256k1_ge)
and (n_terms + 2) * sizeof(secp256k1_scalar)
.
Besides, I like the idea that the batch object creates its own scratch space.
@real-or-random
maybe a "streaming" API is the actually wrong approach
There's another advantage to having a single call instead of a streaming API. In general, developers want to know approximately how long a particular function call takes. With the "streaming" API one can not predict when a call to batch_add_*
will be fast and when it will take much much longer.
and it should be just a single call as in the BIP
If there was a way to do this that allows multiple objects to be batch verified and is extensible this would be worth exploring. I just don't see how.