SSS32 icon indicating copy to clipboard operation
SSS32 copied to clipboard

Recommendations for Auditable Electronic Codex32 Implementations

Open BenWestgate opened this issue 1 year ago • 107 comments

While intended for paper computers, some benefits of codex32 can be maximized through electronic implementations. To enhance this aspect, I propose recommendations for creating auditable codex32 share sets when using electronics.

Inspired by the concept of auditable bitcoin wallets, I have developed an implementation of For a fresh master seed, which ensures full determinism in generating the codex32 share set.

I believe that deterministically choosing every aspect of the codex32 share set enhances protection against potential malicious tampering of shares by detecting any deviation from the determinism.

In issue #54, we discussed that the hash160(public_masterkey) from descriptors serves as a useful deterministic data to use as the default identifier. However, we need to address another aspect that could potentially leak information – the share indices and payloads of the k - 1 independent shares.

Currently, the BIP recommends selecting the next available letter from the bech32 alphabet, in alphabetical order, as share indices:

Take the next available letter from the bech32 alphabet, in alphabetical order, as a, c, d, ..., to be the share index

This approach may inadvertently reduce privacy, as finding share f, g, or h would reveal a backup of at least 5, 6, or 7 shares, respectively. To improve this, I propose an update to the BIP where, during hand creation, dice are rolled to select each new share index. This method is akin to how indexes are created in software like libgfshare-bin.

For electronic implementations, share indices should be determined by the codex32 secret alone. Below is an example of how I chose the share indices in Python:

import random
import hashlib
import hmac
salt=threshold+identifier+str(len(master_seed))
random.seed(a=hmac.digest(master_seed, salt, 'sha512'))
share_index = random.sample(indices_free, n)

The master_seed, threshold, identifier, and seed_length are the only variables in a codex32 secret, so I HMAC these together to seed the selection of share indices. Although a hash of the codex32 secret string would also work.

I think this is the right track but I'm looking to standardize recommendations for electronic implementations. If there's a better way please say so.

Below is an example of how I generated the share payloads in Python:

share_entropy = hashlib.scrypt(password=master_seed, salt=salt, n=2**20, r=8, p=1, maxmem=1025**3, dklen=(int(k)-1)*(seed_length+1))

While this a start, it may not be suitable as a standard recommendation due to scrypt parameters that won't run on hardware wallets. To address this, I am open to exploring alternative variable length secure hash functions. Your input on this matter would be appreciated.

Next, this stretched entropy is divided evenly into k - 1 share payloads. Additionally, an extra byte is allocated for filling the padding bits on the 26th or 52th characters.

Subsequently, the remaining n + 1 - k shares are derived according to the BIP:

    for x in range(int(k) - 1):
        data = list(share_entropy[x*(seed_length+1):x*(seed_length+1)+seed_length+1])
        share_list += [encode("ms", seed_length, k, identifier, share_index[x], data)]
    for x in range(int(k) - 1, n):
        derived_share_list += [derive_new_share(share_list,share_index[x])]

Lastly, if a new set of shares is created for an existing master_seed with the same threshold, I propose incrementing the last character of the identifier by 1, which will cause a completely new set of indicies and share payloads to be chosen.

I welcome any feedback or suggestions for improving the approach. Let's establish standardized and auditable recommendations for electronic codex32 implementations. Thank you for your collaboration.

BenWestgate avatar Jul 28 '23 07:07 BenWestgate

To improve this, I propose an update to the BIP where, during hand creation, dice are rolled to select each new share index.

This would actually complicate things a fair bit -- generating shares in order allows us to have a fixed set of tables that users can use to derive additional shares. If you were generating shares in random order you would need to do something akin to the recovery process -- and for each share index we'd need a "recovery wheel" for that index.

I wouldn't mind releasing a set of "recovery wheels" and suggesting people do this as an additional privacy measure (we've wanted to do this anyway as a "recover the remaining shares from an incomplete set" process), but I think it's too onerous to suggest doing as the default. Especially as the k/n parameters are not something that we think of as secret data. k is even encoded in the share header.

But I agree that this generation mode be the default for electronic implementations.

Below is an example of how I generated the share payloads in Python:

Because you're starting with the master seed, I don't think there's any value in hardening this. If the seed itself is of low-quality randomness then the user is screwed anyway. So I would actually suggest going in the other direction, toward chacha20 rather than scrypt, in the hope of coming up with something computationally simple enough that it could be (theoretically) audited by hand.

Similarly, with the share indices I don't even think we need cryptographic randomness -- the goal is simply to break the implication "share G implies that shares B,C,D,E,F exist". Though of course, if you are using the master seed as entropy, you want to run that through a cryptographic RNG to make sure that no information leaks into the indices themselves. I also don't think the share indices need to be auditable, though there's no reason for them not to be so we might as well.

BTW, I've been holding off on PR'ing to the BIP repo because it seems like you (and us) are still iterating on this, but I definitely do intend to incorporate your suggestions once we've nailed everything down. Thanks a ton for digging into this!

apoelstra avatar Jul 28 '23 14:07 apoelstra

@BenWestgate also, if you are on IRC, #volvelle-wizards is our discussion channel for this stuff.

apoelstra avatar Jul 28 '23 15:07 apoelstra

Thank you for your response I will make changes to my implementation to make it closer to a standard recommendation.

I will get rid of the KDF and use ChaCha20 for filling the secret payload padding, the share payloads and shuffling the indicies so auditing electronic codex32 share sets by hand is possible.

Share indices MUST be derived by a cryptographically secure randomness if not fixed, and we agree they should shouldn't be fixed by default in electronic implementations for the small "n" privacy gain.

By my math if the user made 31 shares and numbered them 1 to 31 it can leak a surprising amount of data: 87 bits. So even with encrypted payloads, the seed would be effectively leaked by a malicious non-deterministic implementation with enough indices known. Although maybe 2nd level SSS encryption should encrypt the plaintext share's threshold, identifier, and indices for reasons like this. #16

Without encryption the max threshold is 9, so the maximum data leak we're concerned with is from 8 indices log2(31*30*29*28*27*26*25*24) = 39 -bits leaked

Leaving only 89-bits of security for the master_seed, which is probably guessable by TLAs in our lifetime if not already.

Regarding the algorithm for the share index selection, using python random's random.seed() and then random.sample() is probably not hand auditable due to the Mersenne Twister.

Instead I could get at least 87-bits from CHACHA20, convert to bech32, and take the first unique character to be the next share index.

But how to know how much data to produce from the hash so I don't run out? Is there a better way to deterministically "shuffle" a list by hand computations?

  1. Use Chacha20 with a secret key (master_seed) to generate a keystream.
  2. Take the first byte of the keystream (or any desired number of bits, depending on the number of elements in the list) and use it as an index to select the first symbol in the shuffled list, unless the value is "s" or a previously selected character. 3.Use the next byte of the keystream (or any desired number of bits) to use as an index to select the next symbol for the shuffled list
  3. Increment the counter for Chacha20 by 1 if out of bytes.
  4. Repeat steps 2 to 4 until all symbols are selected, creating the shuffled list.

Perhaps the chacha20 hash mod 31, mod 30, ... etc What is the right balance between length of secure hash output needed and math operations to derive each index indices that's most efficient for hand auditing and also doesn't introduce modulo bias?

The share payloads are much easier in this regard as they allow repeating characters and use the full 5 bits per character so the chacha20 output stream can be directly used.

I joined the IRC channel as benwestgate.

BenWestgate avatar Jul 28 '23 21:07 BenWestgate

Because you're starting with the master seed, I don't think there's any value in hardening this. If the seed itself is of low-quality randomness then the user is screwed anyway. So I would actually suggest going in the other direction, toward chacha20 rather than scrypt, in the hope of coming up with something computationally simple enough that it could be (theoretically) audited by hand.

from Cryptodome.Cipher import ChaCha20

padding _length = 32 - len(master_seed)

key = master_seed + bytes(padding_length)

nonce = bytes(hrp+k+identifier+"s", 'utf')

cipher = ChaCha20.new(key=key, nonce=nonce)

cipher_stream = cipher.encrypt(bytes(8))

Does this look good as the RNG stream for setting the padding bits in the last symbol of the codex32 secret payload, the share indicies and k-1 share payloads?

The nonce can be incremented by 1 whenever a 512-bit block is exhausted.

Do you prefer zero padding the decoded master_seed bytes as shown above or directly utf-8 encoding the 32 bech32 characters after "s", ie key = bytes(codex32_secret[9:41], 'utf-8')?

Unsure which is easier by hand.

Sad to see double_sha256 go as the padding as 256-bit master_seed's were direct truncation of WIF serialization. But that novelty is a liability for auditing by hand.

Likewise, it feels wrong to set the secret payload padding bits with a RNG stream that depends on the header, causing the payload for identical master_seed's to vary if the header changes.

Yet it seems smart to conserve labor for hand auditors by avoiding computing a fixed nonce ChaCha20 block only to set 2-4 padding bits then discard. These bits should be the first data extracted from the stream as a fast first check for malicious tampering. The next should be the share payloads, as these are also recoverable error detecting and auditable data, the last should be the indices as the order the software provided them is meant to be lost during recovery and so they cannot detect tampering.

BenWestgate avatar Jul 30 '23 10:07 BenWestgate

Share indices MUST be derived by a cryptographically secure randomness if not fixed, and we agree they should shouldn't be fixed by default in electronic implementations for the small "n" privacy gain.

By my math if the user made 31 shares and numbered them 1 to 31 it can leak a surprising amount of data: 87 bits.

Heh, yeah, this seems reasonable. It seems pretty implausible that the user would number their shares 1 to 31 like that but why have a leak when we don't need to. If users want to do this by hand they can use dice rolls, which are cryptographically secure.

Regarding the algorithm for the share index selection, using python random's random.seed() and then random.sample() is probably not hand auditable due to the Mersenne Twister.

Hmm, true. And I see now the value of auditability, since in theory each share index could be leaking 5 bits (to an attacker who knows the order that the shares was generated anyway).

But how to know how much data to produce from the hash so I don't run out? Is there a better way to deterministically "shuffle" a list by hand computations?

Knuth's "Algorithm P" can do this. (The question is about finding a uniform subset, but conveniently this algorithm finds a subset in a uniform order. If you want a sorted order, which is just as good from a security POV, you can use "Algorithm S" or just sort manually.) Howevery, algorithm P assumes you can uniformly sample from the range [i, 31] for each i, which is hard to do without a theoretically indefinite stream of randomness. Though if you use rejection sampling, the probabilities are at least much easier to compute.

Let me think about this.

Does this look good as the RNG stream for setting the padding bits in the last symbol of the codex32 secret payload, the share indicies and k-1 share payloads?

The RNG initialization looks good to me, although

  • I guess identifier comes out of a different RNG?.
  • I also don't know what encrypt() does; I assume it xors the input with the random stream.
  • It's a cool idea to set the nonce uniquely per seed; I'd have probably just left it at all-zeroes...see below comments.

Also, I wouldn't bother setting the padding bits randomly. You can just leave them as 0. They aren't used and do not feed into BIP32 derivations.

The nonce can be incremented by 1 whenever a 512-bit block is exhausted.

Yep, though this is a tiny bit subtle. If you were to produce 256 blocks then you'd be incrementing the identifier part of the nonce, and would produce the same output as you would've if you'd simply started with that identifier.

But since we're only producing 31 shares, each of which only takes half a block, this is fine.

Do you prefer zero padding the decoded master_seed bytes as shown above or directly utf-8 encoding the 32 bech32 characters after "s", ie key = bytes(codex32_secret[9:41], 'utf-8')?

Unsure which is easier by hand.

Directly using the bytes is definitely easier, and I think is also more elegant.

Sad to see double_sha256 go as the padding as 256-bit master_seed's were direct truncation of WIF serialization. But that novelty is a liability for auditing by hand.

Yeah, that is a cool property. Though TBH I don't think users frequently see WIF serialization of xprivs so it's something of a curiosity.

Likewise, it feels wrong to set the secret payload padding bits with a RNG stream that depends on the header, causing the payload for identical master_seed's to vary if the header changes.

Yeah, hmm. There's two ways you could think of this:

  • Changing the threshold or identifier is a "reset the world" change and we should put this stuff into the RNG. This prevents users from re-sharing the same secret in different contexts while producing the same shares, which is good because the different share sets would interact algebraically in possibly surprising ways.
  • Changing the threshold or identifier is a "relabelling" which we ought to support. I actually considered producing a worksheet for this because in the early days I found myself relabelling stuff a lot (to get encrypted shares with distinct but human-readable headers).

The threshold value should definitely go into the RNG. Under no circumstances should you "relabel" the threshold, unless you are doing something dangerous and way off-script in which case the spec doesn't matter anyway. After thinking about it for a bit, I also think we should do this with the identifier. It creates a nice workflow where "if you want to re-split the secret, you have to change the identifier" and conversely "if the identifiers are the same (and the they come from the same seed), they're compatible shares".

Re-sharing secrets is a safe thing to do as long as the initial shares are independently random, and there are lots of reasons that users might want to do it.

apoelstra avatar Jul 31 '23 15:07 apoelstra

I guess identifier comes out of a different RNG?.

https://github.com/BlockstreamResearch/codex32/issues/54#issuecomment-1650573709

identifier is either selected by the user or defaults to the hash160(master_pubkey), making it possible to tell which share set recovers spending a watch only wallet, as it is prepended to child xpubs in descriptors. Of various options this one seems most useful for these 4 characters.

I also don't know what encrypt() does; I assume it xors the input with the random stream.

That's exactly what it does, the stream is being XOR'd with zeros in order to print the cipher stream directly.

It's a cool idea to set the nonce uniquely per seed; I'd have probably just left it at all-zeroes...see below comments.

It is an essential feature.

Having the nonce be a function of the header allows for creating new backups for the same master_seed that don't reuse the original randomness. For example if a share or two was compromised, the old set could be destroyed and replaced with a share set using a new header (new randomness) to restore security without sweeping funds. (Whether it's advisable to "rotate" shares given an attacker may reach a threshold before you've Completely Destroyed the partially-compromised share set is another question.)

If the nonce was fixed to bytes(8), you'd get the same independent shares for the same master_seed every time, even if you changed the threshold or identifier. This even helps with access control, if you want one group to be able to recover at threshold 3 and another group to recover at threshold 2, make two share sets with each threshold and one set will NOT leak payload data of the other set.

BenWestgate avatar Jul 31 '23 16:07 BenWestgate

It is an essential feature.

This is a great argument. Agreed on all counts.

apoelstra avatar Jul 31 '23 16:07 apoelstra

Also, I wouldn't bother setting the padding bits randomly. You can just leave them as 0. They aren't used and do not feed into BIP32 derivations.

I dislike wasted space, especially in handwriting. But I agree padding should not be set randomly. Let's see how to make those bits do some work:

  1. These bits are as secret as the secret, it's a waste to set them with a non-linear secure hash function. double_sha256(seed) etc with no efficient erasure correction ability.
  2. These bits are as secret as the secret, it's a waste to set them to some public data like the hash160(master_pubkey) as you can't see them until you can already compute the pubkey, this is just another form of 1 with extra wasted work.
  3. But they could be parity bits or linear binary error coding. These algorithms are pretty fast to do on the master_seed bytes by hand.

Do you have any suggestions?

Naive idea is upper half and lower half parity, which has nice property of keeping the same 64-bit chunk length for 128/256-bit seeds. But falls apart at 512-bit seeds which have 3-bits of padding (170 bit chunks?) This is not the most efficient error coding. Proper ECC should correct contiguous bit erasures anywhere in the master_seed that are the length of the checksum and is what should be used. This offers a little extra help in the absolute worst case scenario, where you have threshold shares but some are invalid beyond correction by the codex32 BCH code but are less than a symbol away from getting the right correction, potentially making brute forcing your valid codex32 secret 4 to 16 times cheaper.

What is the best ECC to put here? Will it wastefully repeat the same info as the codex32 secret checksum or can it supplement it?

Agreed, regarding re-sharing the same secret:

The BIP already says only shares with a matching identifier can be combined. Allowing "relabeling" is not very useful because it changes the checksum, even if share payloads do not change. And then you'd create a situation that the BIP says is not supported: two shares of different identifiers can be combined, but only if you ignore their checksum and identifier and just combine payloads. It's neat to just increment the LSB of the identifier to re-share an existing master_seed, then it still retains 3 symbols of the master_pubkey fingerprint, while allowing 32 re-shares. And setting a custom identifier to reshare should also be supported as long as it's unique from all previously used identifiers at that threshold.

On this topic of identifier uniqueness for re-sharing, it's not clear a user would know all previously used identifiers besides the one they have just combined with to do a reshare. Especially if they are not destroying old sets, they may combine with an older share set, and if we naively increment that, we will re-use the same nonce! So I think the default for "re-shares" needs to generate some independent randomness when users don't know all outstanding identifiers for this seed and threshold. Then we have a birthday problem and just pray for no collisions... The alternative is to not support useful cases:

Can you think of any reason to have 2 outstanding share sets of the same threshold but different identifiers? I did: prevent one set from combining with another set to defend against custodian collusion.

I might make a 2-of-3 with my dad and executor and a 2-of-3 with my mom and executor, each with unique identifier of same master_seed, and now recovering the master_seed requires me or my executor and 1 parent, but both my parents cannot combine shares to recover without me or my executor. Making them less attractive targets for a physical attack than if this was a 2-of-4 between my parents, myself and my executor. Useful.

How much randomness do we have to inject into the identifier if we don't know all the old identifiers to be pretty safe we don't reuse one? Can we still keep some bech32 encoding of the public key fingerprint, to identify the underlying wallet master_seed and connection between unique share sets of it or must inject all the entropy we can to be safe?

BenWestgate avatar Jul 31 '23 18:07 BenWestgate

I dislike wasted space, especially in handwriting. But I agree padding should not be set randomly. Let's see how to make those bits do some work:

This is an interesting direction. We can definitely stick a checksum in here and we can definitely make it "independent" of the existing checksum in the sense that it will meaningfully add error correction. But it will be a bit-level error correcting code which has some unintuitive properties when applied to data that's normally encoded as 5-bit characters.

In particular I think a 2-bit code will only be able to detect one error (I guess, two, if they're consecutive), so it would mean we could detect certain character errors but not others. This is not zero value but it's fairly limited. I think we should keep thinking whether there's a better use of 2 bits before committing to this.

Allowing "relabeling" is not very useful because it changes the checksum, even if share payloads do not change.

It does, but you can compute what the checksum will change to, far more quickly than recomputing the whole thing. You can even use a computer to do most of the work without exposing any secret data to the computer.

On this topic of identifier uniqueness for re-sharing, it's not clear a user would know all previously used identifiers besides the one they have just combined with to do a reshare. Especially if they are not destroying old sets, they may combine with an older share set, and if we naively increment that, we will re-use the same nonce! So I think the default for "re-shares" needs to generate some independent randomness when users don't know all outstanding identifiers for this seed and threshold. Then we have a birthday problem and just pray for no collisions... The alternative is to not support useful cases:

Agreed. Unfortunately I think we just need to live with the birthday problem. In practice I think it'd be okay, both because the chance of collisions is fairly low (you need close to 1024 re-shares before the probabilites get high, and like, what are you doing with your secrets then..) and because the failure mode is that you'll generate exactly the same share set as before. Which is not much worse than having the original set still out there.

Can you think of any reason to have 2 outstanding share sets of the same threshold but different identifiers? I did: prevent one set from combining with another set to defend against custodian collusion.

Yeah, this is sorta what I'm imagining. That, or having "backup shares" that might have the same/overlapping custodian sets, but nonetheless shouldn't be mixable. Unfortunately these are the exact cases where collisions would undermine your security model.

How much randomness do we have to inject into the identifier if we don't know all the old identifiers to be pretty safe we don't reuse one? Can we still keep some bech32 encoding of the public key fingerprint, to identify the underlying wallet master_seed and connection between unique share sets of it or must inject all the entropy we can to be safe?

The birthday paradox shows up at around sqrt(N). So if you used the entire 20-bit identifier you'd expect collisions around the time you had 10 outstanding bits (1024). The curve is pretty steep so you have very low probabilities for quite a while, so it's fine to use 1000 as a rough estimate. If you used only half the identifier, 10 bits, you'd expect collisions after around 30 ... and have nontrivial probability of collisions after only a few.

My feeling is that you'd need to use the entire identifier for this.

apoelstra avatar Jul 31 '23 23:07 apoelstra

I think using the entire identifier is fine. This is a sub-case of the support custom identifier case, where we MUST randomize it when the user and software can't recall all previously generated share set identifiers for this master_seed.

Otherwise implementations can default to suggesting to increment it, preserving most of the hash160(master_pubkey) fingerprint or as always the user can pick anything unique: "faml" "moma" "dada" "lawr" "acct" "aunt" etc. Wallets should store the identifiers they have created to prevent the user from repetition or at least warn in event it is deliberate to repair or extend an existing share set.

Since this "I forgot" case requires new randomness that cannot be derived from master_seed the following procedure from auditable bitcoin wallets is needed so that the new random identifier can't leak seed data or deliberately repeat an identifier to break security:

identifier = convertbits(hashlib.scrypt(password=user_data, salt=app_entropy, n=2**20, r=8, p=1, maxmem=1025**3, dklen=3), 8, 5)[:4]
  • These parameters will need to be reduced to support HWWs and hand auditing (see below)

Where the user provides randomness and the machine generates 20-bits and unconditionally displays it and they are mixed with a one-way compression function.

Process of creating the new identifier should be functionally equivalent to the following steps:

  1. Wallet generates a pseudo-random number (20 bits or more) that we call app entropy.
  2. Wallet unconditionally displays app entropy to the user (in bech32, hex etc).
  3. User is then asked for some data to be entered that allows to improve security in case app entropy generator has a backdoor (see Dual EC DRBG). This could be a one-time text entered by the user only for this purpose, or a passphrase containing enough entropy that is used later for other purposes.
  4. Wallet uses computationally expensive KDF to deterministically mix app entropy and user-entered text to produce the final identifier.

Important considerations:

  • App entropy must be displayed unconditionally so that wallet cannot know if user is going to audit it or not.
  • User-entered data should be visible and reproducible by the user to be able to arrive to the same data in an auditing software. Good: text or pattern of button presses. Bad: accelerometer data, mouse movements (hard to record and reproduce).
  • User-entered data should contain enough entropy and should be unique. If the wallet has a backdoored RNG which does not generate unique numbers, we're relying on user-entered data to avoid nonce reuse and if the wallet is trying to leak the master_seed we're relying on user entropy for secure key mixing.
  • Unless necessary, wallet should not rely solely on user input to generate identifier because they may inadvertently give a non-unique input. In such case, honest wallet with correct RNG implementation would with strong probability generate a unique nonce.
  • Process of mixing app entropy and user input must use computationally expensive KDF to compensate for non-uniqueness OR low-entropy in user input.

On second thought, it may not need to be an expensive KDF, but it may need to be a one way compression function. HMAC-SHA256 or our ChaCha20 hash. With the user input as key and app_entropy as nonce. I believe simple XOR won't provide protection from master_seed leakage when users enter less than 20-bits of entropy, which is likely when limited to 4 bech32 characters.

The worst case is the attacker predicts the user's input they can recover 20-bits of leaked master_seed. My KDF above costs $0.0002 to check all possible 20-bits with known user data. So there's no way to secure this other than that the user should be able to type more than 4 characters to maximize their chances of a unique string w/ 20-bits entropy.

from Cryptodome.Cipher import ChaCha20
from Cryptodome.Random import get_random_bytes
from segwit_addr import convertbits

# user data is limited to 32 bytes
padding _length = 32 - len(user_data)
key = user_data + bytes(padding_length)

nonce = get_random_bytes(8)
cipher = ChaCha20.new(key=key, nonce=nonce)

identifier = convertbits(cipher.encrypt(bytes(3)),8,5)[:4]

For auditing by hand, provide 8 de-biased dice rolls to select the custom identifier avoids doing a chacha20 hash just for this.

BenWestgate avatar Aug 01 '23 08:08 BenWestgate

Heh, your comment inspired me to actually read dual-EC-DBRG. It's quite elegant, other than the obvious backdoor, and also because it's obscenely slow. My read of it is that backdoor allows an attacker who has seen 30 (aligned) bytes of output can learn the internal state of the RNG, with a little bit (16 bits) of grinding. Once they know the internal state they can predict every future random number. However they cannot learn previous states of the RNG, and in particular there is no way for them to learn the seed, assuming the discrete log problem is hard.

So for this particular usecase, dual-ec-drbg would actually be perfectly fine :P.

But regarding scrypt parameters, let's back up a bit. I spoke on the phone with @roconnor-blockstream and we briefly touched on this. One observation he made is that any considerations applied to the identifier also applies to the shares (we should assume our attacker has at least one share; arguably we should assume they have k-1 shares). So I think we should do the same thing in both cases, and in this case the 20-bit ID is the least of our problem next to the key-length-many-bits shares.

Our attack scenario is: an attacker has access to the 20-bit ID, a share, multiple shares, whatever. And they are trying to access the actual master secret. The scenarios are:

  • If the data was generated maliciously then the seed could just be leaked outright.
  • If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the data than by deriving addresses.
  • If the data is generated by a slow RNG/KDF, then an attacker with the data has no advantage over one without.

Having auditability forces us into the middle choice. If we don't worry about auditability then we'll be in either the "all good, no attacker advantage to having share data" case or the "all bad, the seed was totally leaked" case. And we can bias toward "all good" with the usual techniques. Open source software, independent audits, etc.

Given this, my feeling is that if somebody cares enough to audit their random data by hand, they care enough to just generate all their data by hand (which would actually be much faster than auditing, no matter what RNG we chose). And if they don't care enough, they'd be better off if we used scrypt everywhere rather than chacha everywhere.

apoelstra avatar Aug 01 '23 18:08 apoelstra

Auditing a wallet always requires a computer to verify the public keys. So hand auditable share sets are not very important considering as you say the share set could be generated by hand faster for the paranoid.

That said, it makes sense for a given codex32 secret to always produce the same shares and share index order on electronics, but it is fine to require electronics to compute the shares with an expensive KDF.

I initially used Scrypt because it was built in hashlib, but the current state-of-the-art for memory hard KDFs is Argon2id. Its main advantage is being able to set the time cost and memory cost independently, while with Scrypt the time cost depends on memory cost.

BenWestgate avatar Aug 02 '23 13:08 BenWestgate

Agreed. I don't have much of an opinion between scrypt and argon2id. I'd be tempted to use scrypt because it's more widely supported and has been tested pretty heavily by the various cryptocurrencies using it a PoW. I'm also not sure there's a ton of value, for us, in being able to set time and memory separately. (And we could emulate that somewhat by just iterating scrypt.)

apoelstra avatar Aug 02 '23 13:08 apoelstra

  • If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the [share] data than by deriving addresses.

  • If the data is generated by a slow RNG/KDF, then an attacker with the data has no advantage over one without.

This means the KDF hardness parameters only need to cost more than deriving an address? The cheapest "address" is the master pubkey hash we suggest to use as default identifier.

Cost to beat is: HMAC-SHA512, produce a pubkey, then ripemd160(sha256(master_pubkey))

That has 1 in 2^20 chance of a false positive. (Or it's faked) In that case, a known address must be derived.

Setting time cost and memory cost an order of magnitude (or more) higher than an address derivation should be safe and able to run on HWWs

BenWestgate avatar Aug 02 '23 20:08 BenWestgate

This means the KDF hardness parameters only need to cost more than deriving an address? The cheapest "address" is the master pubkey hash we suggest to use as default identifier.

Yeah, that's my thinking.

Setting time cost and memory cost an order of magnitude (or more) higher than an address derivation should be safe and able to run on HWWs

+1

apoelstra avatar Aug 02 '23 21:08 apoelstra

I did some research with the chat robot and it claims HWWs are using "a few kB" of memory for address derivation from a BIP32 seed and most popular HWWs could use as much as 64-512kB for KDF memory cost parameters.

This makes memory hard algorithms on HWWs useless against GPUs: 8GB / 0.5MB > # GPU cores, but may still help somewhat against ASICs.

PBKDF2 in BIP39 is especially weak against GPUs (needs 8+ random word passphrases to be secure).

Given this, creating a secure auditable master_seed may want to use different parameters on mobile and desktop wallets (1+ GB) than hardware wallets (512kB). Here's an example from my implementation that costs 1GB and a few seconds:


def fresh_master_seed(seed_length,user_entropy,bitcoin_core_entropy):
    """Derive a fresh master seed with seed length bytes from user entropy and bitcoin core's entropy aka app entropy."""
    if 16 > seed_length > 64:
        return None
    master_seed = hashlib.scrypt(password=bytes(user_entropy+str(seed_length),"utf"), salt=bytes(bitcoin_core_entropy,"utf"), n=2**20, r=8, p=1, maxmem=1025**3, dklen=seed_length)
    return master_seed

A time_cost of 5-10 seconds may be recommended to reduce the disadvantages of weak hardware. Unlike authentication, where user must wait time_cost each login so is limited to 1-2 seconds or less, this is a one time wait at master_seed generation.

As long as the parameters were something unconditionally displayed like App Entropy anyone can verify they get the same codex32 secret by computing the KDF on a trusted device.

For share set generation, which only needs to cost more than an address derivation which I'm told takes "a few seconds or less" for a HWW then targeting something that takes 10 seconds for the worst hardware we want to support and uses 64kB or more on should be 2-10x as time costly and 10-100x as memory costly as address derivation.

Here all wallets should use the same parameters so codex32 secrets always derive identical codex32 share sets. Much like how 2048 rounds is part of bip39.

BenWestgate avatar Aug 02 '23 22:08 BenWestgate

Yeah, "a few KB" for address generation sounds about right. Memory-wise it's extremely cheap and therefore extremely parallelizable. There's also a time-memory tradeoff. I think I could probably do it with <100 bytes of RAM but it wouldn't be very fast.

Given this, creating a secure auditable master_seed may want to use different parameters on mobile and desktop wallets (1+ GB) than hardware wallets (512kB). Here's an example from my implementation that costs 1GB and a few seconds:

Is the assumption here that the user entropy or Core entropy might be weak? I think if together they exceed 128 bits of entropy then there isn't any need for hardening at all. Or am I confused?

For share set generation, which only needs to cost more than an address derivation which I'm told takes "a few seconds or less" for a HWW then targeting something that takes 10 seconds for the worst hardware we want to support and uses 64kB or more on should be 2-10x as time costly and 10-100x as memory costly as address derivation.

Yep, sounds good to me.

apoelstra avatar Aug 02 '23 23:08 apoelstra

Is the assumption here that the user entropy or Core entropy might be weak? I think if together they exceed 128 bits of entropy then there isn't any need for hardening at all. Or am I confused?

It's to protect against RNG backdoors in "secure element" HWWs.

Neither the device or user should be trusted, so the hardening makes the most of what entropy the user is willing to provide to protect themselves. Increasing the cost tremendously vs SHA2 for the backdoor author to steal.

This is the "lite" effort version of the dice debiasing worksheet. KDF offers real protection even without achieving 128-bits user entropy, even half that is act of congress expensive:

Attack cost estimates of memory hard KDFs by entropy

65-bits would be on order of 10 billion USD with 1GB and 1 second argon2id parameters.

50-bits would be around $1 million. It's less profitable to backdoor the RNG if user can mash their keyboard for a minute or toss a dice 10 times and verifiably make this attack cost more than they'll ever store.

BenWestgate avatar Aug 03 '23 00:08 BenWestgate

In particular I think a 2-bit code will only be able to detect one error (I guess, two, if they're consecutive), so it would mean we could detect certain character errors but not others. This is not zero value but it's fairly limited. I think we should keep thinking whether there's a better use of 2 bits before committing to this.

From BIP-173:

...ordering is chosen to minimize the number of pairs of similar characters that differ in more than 1 bit.

The most commonly substituted characters are all 1-bit errors.

For 256-bit seeds w/ 4-bit ECC as padding, the total error correction becomes 4 substitutions plus 1 substitution if it is a 1-bit error. For 128-bit seeds w/ 2-bit ECC as padding, there's no corrections but it can narrow down a 1-bit error substitution to the upper or lower half of the code, maybe better. 512-bit seeds also gain a 1-bit substitution error correction boost.

If the same 2 or 4 bit code is added to each independent share then do all the derived shares get valid ecc padding as well?

BenWestgate avatar Aug 03 '23 04:08 BenWestgate

It's to protect against RNG backdoors in "secure element" HWWs.

OK, yeah, this sounds good to me. Though of course, verifying this is done correctly will require redundant hardware. But better to have the ability than to not.

If the same 2 or 4 bit code is added to each independent share then do all the derived shares get valid ecc padding as well?

I am 90% sure, yes

For 256-bit seeds w/ 4-bit ECC as padding, the total error correction becomes 4 substitutions plus 1 substitution if it is a 1-bit error.

Similarly, I am 90% sure this is right ... the two codes don't directly add so that you can do some combined error correction algorithm (as far as I'm aware). But I do think that you can do a form of soft error correction where you can find all the possible strings that are 5 errors away from the target string. There will be several but only one, I think, with pretty-good probability, will have the extra parity bits set correctly.

And even if there's more than one, it's easy to manually check them all. In fact, in general you should be able to do this -- "correct" more than four errors by using error correction to reduce the search space by a factor 2^20. Then having a couple extra bits will give you a couple extra bits' worth of reduction.

apoelstra avatar Aug 03 '23 15:08 apoelstra

That was originally why I wanted a default identifier derived from master_seed, the 2^20 search reduction. But now it’s a hash160(master_pubkey), it’s only a few HMAC-SHA512s faster than using a known wpkh address to eliminate wrong corrections.

I’ll test how many candidate master_seed to > hash160(master_pubkey) can be checked per second and update this post.

It's pretty slow...

BenWestgate avatar Aug 03 '23 19:08 BenWestgate

from electrum import bip32
for candidate_seed in range(2**20):
    bip32.BIP32Node.from_rootseed(candidate_seed.to_bytes(16, 'big'),xtype="standard").calc_fingerprint_of_this_node()

This takes 100 seconds on i5-1135G7 laptop and calculates fingerprints for 2^20 candidate seeds.

For comparison, checking the first wpkh() address of 2**16 candidate seeds:

from electrum import bip32
from electrum import ecc
from electrum.crypto import hash_160

for candidate_seed in range(2**16):
    priv_masterkey = bip32.BIP32Node.from_rootseed(candidate_seed.to_bytes(16,'big'),xtype="standard")
    hash_160(priv_masterkey.subkey_at_private_derivation([2147483732, 2147483648, 2147483648]).subkey_at_public_derivation([0,0])[1].get_public_key_bytes(compressed=True))

Took 46 seconds, so 736 seconds to check 2**20 seeds. The master_pubkey fingerprint identifier is a useful 7.4x speed up vs known wpkh addresses.

BenWestgate avatar Aug 03 '23 20:08 BenWestgate

Back to the desired hardening of share payload derivation:

import hashlib
for candidate_seed in range(2**16):
    hashlib.scrypt(password=candidate_seed.to_bytes(16,'big'), salt=bytes('2test','utf'), n=2**6, r=8, p=1, maxmem=68*1024, dklen=16)

This generates the 1st share's payload for 2^16 candidate master_seed's with codex32 secret threshold = '2' and identifier = 'test' in 7 seconds.

This is 2.2x slower than checking the fingerprint (3.1 seconds) but still 7x faster than checking the first wpkh() address.

  • If the data is generated by a slow RNG/KDF, then an attacker with the [share] data has no advantage over one without.

The maxmem cannot be raised any higher as most HWWs have 128kiB of RAM and there is no reason to not support them. So following your advice, lets slow it down by doing 16 rounds. If you believe share payloads must be slower to derive than an address then we can use your idea:

(And we could emulate that somewhat by just iterating scrypt.)

import hashlib

for candidate_seed in range(2**14):
    intermediate_key = hashlib.scrypt(password=candidate_seed.to_bytes(16,'big'), salt=bytes('2test','utf'), n=2**6, r=8, p=1, maxmem=68*1024, dklen=16)
    for _ in range(15):
        intermediate_key = hashlib.scrypt(password=candidate_seed.to_bytes(16, 'big'), salt=intermediate_key, n=2 ** 6, r=8, p=1, maxmem=68 * 1024, dklen=16)
    intermediate_key

This generates the 1st share's payload for 2^14 candidate master_seed's with codex32 secret threshold = '2' and identifier = 'test' in 43 seconds, or ~172 seconds for 2^16 which is much slower than 46 seconds to derive 2^16 wpkh() address payloads.

I have no idea why iterating it 16 times caused a 25x+ slowdown, computers are magical.

Any problems here?

I'm still wondering why the shares need to be hardened since the search space is the 128-bit master_seed. Are we concerned that may some day not be long enough to be secure against brute force without the slowdown of address derivation?

Our attack scenario is: an attacker has access to the 20-bit ID, a share, multiple shares, whatever. And they are trying to access the actual master secret. The scenarios are:

If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the data than by deriving addresses.

I thought it was not even possible to increment a 128-bit counter thru all values without 10 nuclear plants running at full capacity for a year (or something like that).

So lets confirm with a 4th person, the threat model is valid before we roast these poor Trezors. If the share payloads were fast to derive with ChaCha20 they'd become an almost unlimited source of error detection with each independent share acting as a cryptographic hash checksum the length of the secret. Is that more useful than making it faster for guessers to check a number that cannot be counted to?

These scrypt parameters with more iterations will work for hardening the master_seed against RNG backdoors with user supplied entropy on 128kiB RAM HWWs, as well as for deriving a unique identifier securely when re-sharing a master_seed when not all previous identifiers are known.

BenWestgate avatar Aug 03 '23 23:08 BenWestgate

I'm still wondering why the shares need to be hardened since the search space is the 128-bit master_seed. Are we legitimately concerned that may some day not be long enough to be secure against brute force without the slowdown of address derivation?

If we really have 128 bits of good entropy then I agree that hardening has no value. But maybe the seed isn't really that strong, or some of it is leaked, or whatever ... then my thinking is that we ought to try to "not make the situation worse" than bare BIP32, and we can do that by hardening all our randomness roughly as much as an EC mult.

Now, I guess you're suggesting that we make the 128 bits itself hard, even against weak entropy, by doing a serious amount of hardening on the user's input. This will protect us from bad input but not from key leakage.

So maybe the "should we bother hardening the shares/IDs" question comes down to "should we attempt to ameliorate the damage caused by leaked seed data"? My feeling is that the hardening doesn't add much cost so we might as well.

This is contradicting this:

Yeah, sorry, my thinking has changed a bit over the course of this thread.

apoelstra avatar Aug 03 '23 23:08 apoelstra

So maybe the "should we bother hardening the shares/IDs" question comes down to "should we attempt to ameliorate the damage caused by leaked seed data"? My feeling is that the hardening doesn't add much cost so we might as well.

Makes sense. This is like medicine "First, do no harm."

So then do we need to harden the master_pubkey fingerprint default identifier? Otherwise it's a 7.4x speed up with a 1 in a million chance of being a false positive when grinding out a partially leaked master_seed?

Or is that not important because the attacker could get the fingerprint from the watch-only wallet?

BenWestgate avatar Aug 03 '23 23:08 BenWestgate

If we are actually using the BIP32 fingerprint then we already have enough hardening, since the bip32 fingerprint requires an EC mult to calculate it from the seed :).

If we are using some alternate notion of "fingerprint" then yes, we should make sure that it's at least as hard as an ecmult.

apoelstra avatar Aug 04 '23 00:08 apoelstra

If we are actually using the BIP32 fingerprint then we already have enough hardening, since the bip32 fingerprint requires an EC mult to calculate it from the seed :).

Yes, that is the fingerprint we have in mind.

True about the EC mult, but the fingerprint is hash_160(master_pubkey) while a wpkh() address is something like hash_160 of [BIP32_fingerprint/84'/0'/0']xpub.../0/0 so there are 5 additional HMAC-SHA512 child derivations to slow it down, assuming BIP84 standard derivation path used. That slowdown is 7 fold as I calculated earlier. Unless the slowdown thanks to HMAC-SHA512 can be ignored due to ASICs or something. We also don't specify implementations use a particular derivation path. It may only be 1 HMAC-SHA512 if they use m/*.

A cheeky fix would be HMAC-SHA512-ing the master_pubkey fingerprint 5 times to set the identifierto make it identical work to check as a known address. But this unfortunately breaks the ability to check the fingerprint matches the share identifier by converting from hex to bech32 in your head.

My feeling is since descriptors felt it safe to include 4 bytes of hash_160(master_pubkey) even in encrypted wallets, it should be safe to include 20-bits in our shares which are much less likely to be stolen than an online descriptor. An attacker has to consider the identifier may just be random noise as well since it's only a default.

BenWestgate avatar Aug 04 '23 00:08 BenWestgate

My feeling is since descriptors felt it safe to include 4 bytes of hash_160(master_pubkey) even in encrypted wallets, it should be safe to include 20-bits in our shares which are much less likely to be stolen than an online descriptor.

Agreed on all counts.

I would further say that, while BIP49-style keys (which I guess is what 99.9% of all keys are..) do 7 derivations, in principle, people could be using their master keys to generate addresses. Or m/* or something.

A single constant-time EC mult is something like 100x the cost of an HMAC (500ns vs 50us). I think we should try to match the 100x factor, but not bother with the extra 7x factor since we're getting into increasingly tenuous justifications for it..

apoelstra avatar Aug 04 '23 02:08 apoelstra

The electrum bip32 library I used must be unrepresentatively slow at the HMACs, it should have been a few percent slower, but not 7x. The chat robot agrees 80%+ of the time is spent going from private to public.

One scrypt round n=2**6, r=8, p=1 would be twice as slow on my hardware, but I don't trust the numbers. Part because this library gave bad data.

But in general, there's no particular scrypt iterations we can say for sure will be more expensive than the EC mult on all hardware and software that may ever exist to attackers. Without grievous overestimation. But what if we did the exact same work, plus some?

This might sound dumb, but why not use public key hash for the share payloads?

Consider this is the time to beat,

import bitcoin.bip32 as bip32
import hashlib

def compute_bip32_fingerprint(master_seed):
    # Step 1: Calculate the extended private masterkey (m) from the master seed using HMAC-SHA512.
    m = bip32.HMAC_SHA512(b"Bitcoin seed", master_seed)

    # Step 2: Compute the extended public masterkey (M) from the extended private masterkey (m).
    M = bip32.privkey_to_pubkey(m)

    # Step 3: Take the first 4 bytes of the hash160 of the serialized extended public masterkey (M).
    M_serialized = M.serialize()
    fingerprint = hashlib.new('ripemd160', hashlib.sha256(M_serialized).digest()).digest()[:4]

    return fingerprint

then,

def compute_share_payload(master_seed, threshold, identifier, share_index):
    # Step 1: Calculate the extended private masterkey (m) from the master seed using HMAC-SHA512.
    m = bip32.HMAC_SHA512(bytes(threshold+identifier+share_index, 'utf'), master_seed)

    # Step 2: Compute the extended public masterkey (M) from the extended private masterkey (m).
    M = bip32.privkey_to_pubkey(m)

    # Step 3: Take the first 16 bytes of the hash160 of the serialized extended public masterkey (M).
    M_serialized = M.serialize()
    share_payload = hashlib.new('ripemd160', hashlib.sha256(M_serialized).digest()).digest()[:16]

    return share_payload

This is guaranteed to not be cheaper, and it's also not any slower than it needs to be. It's also very easy for wallet developers to implement, they nearly already have (perhaps too close and they forget to remove b"Bitcoin seed"...). But since it's a 128-bit match rather than 20-bits or 4 bytes, it seems there should be one more step inserted somewhere to slow it down. Open to suggestions, if you agree.

It also needs to support 16-64 byte outputs. So something like HMAC-SHA512, scrypt, pbkdf2 could work to stretch and slow it. The master_seed should be mixed back in to preserve the original entropy. The target cost to exceed is an address with 5 extra derivations.

BenWestgate avatar Aug 04 '23 06:08 BenWestgate

This might sound dumb, but why not use public key hash for the share payloads?

I think this is too much of a burden on implementors. Though I guess if they support BIP32 they need to support this anyway. But my feeling is, demanding scrypt is fine because it's a standard function in most any general-purpose crypto library. But support for constant-time multiplication on a specific elliptic curve is a bit more of a demand, especially since you do want it to be efficient.

EC mult is also way easier for implementors to screw up and introduce sidechannels (though again, the same risk exists for BIP32 derivations, so maybe this isn't an additional problem).

But in general, there's no particular scrypt iterations we can say for sure will be more expensive than the EC mult on all hardware and software that may ever exist to attackers.

~Sure, but I don't think we really need to. Let's just assume a 100x differential.~

I apologize, I think I've been barking up the wrong tree here. On my system hmac takes 1.35us and a single ecmult gen takes 25.3us, so there's a ~20x differential, which is already less than I expected. But this is not the right comparison. If an attacker is grinding through different EC mults then their marginal operation will be an EC add (0.35us) or double (0.15us), both of which are an order of magnitude faster than an HMAC. (You might think, if the attacker needs to hash-then-ec-mult, all the ecmults will be with respect to different random values, so he has to do separate ecmults for each. Not so. Algorithms like pippinger will let him reuse computations across all of them. Or maybe you assume the attacker has to do 7 ecmults in series; same story, but with a 7x slowdown.) So against a naive software-only attacker, just using a sha256-hmac already provides more protection than the EC mult.

Now, you might assume that the attacker has a sha2 ASIC (or an scrypt ASIC, or whatever) and can do these HMACs a million times faster, but does not have an EC ASIC. Or you might assume the opposite. And I agree that "just do both operations", which is what "copy the BIP32 derivation logic" gets us, will protect us no matter what.

But I think we're way overthinking this. My new vote is that we should just use sha256-hmac for these derivations, on the basis that it's easy to implement and guaranteed to be available with any crypto library, even one that has no awareness of bitcoin, and will be constant-time even if the implementor wasn't thinking about this. We could use sha512-hmac if you want to imitate BIP32, but I don't think it matters. You could iterate it 100x if you really want to cover the case where public addresses only occur after sereval derivation steps, but I don't think it matters.

apoelstra avatar Aug 04 '23 13:08 apoelstra