bips icon indicating copy to clipboard operation
bips copied to clipboard

BIPXXX: Taproot Annex Format

Open ariard opened this issue 2 years ago • 9 comments

This is WIP, see ML post for context.

ariard avatar Oct 10 '22 05:10 ariard

I think https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-October/020991.html is the mailing list post in question.

ajtowns avatar Jan 04 '23 05:01 ajtowns

Yes current state of the TLV format discussion is here: https://github.com/ariard/bips/pull/1 and implementation here: https://github.com/bitcoin-inquisition/bitcoin/pull/9

ariard avatar Jan 05 '23 02:01 ariard

Updated at 6f3dcc2 with the suggestions from https://github.com/ariard/bips/pull/1.

I think I still have two issues with the current approach:

  • in case of script path spend, we might have a redeem path with <pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.
  • re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

ariard avatar Jan 17 '23 00:01 ariard

* in case of script path spend, we might have a redeem path with `<pubkey_alice> OP_CHECKSIG <pubkey_bob> OP_CHECKSIG`, where Alice is committing to annex X and Bob is committing to annex Y as spending policies. The current approach by {type,length} delta encoding might prevent to combine them. I don't know if it's a use we care about, and if we should have some clawback mechanism to aggregate annexes from signers sharing the same tapscript.

CHECKSIG can only commit to an input's annex as a whole; so in the case above either X=Y=the entire annex, or one or both of the signatures are invalid/incompatible. You'd need something like:

ANNEXLENGTH 3 EQUALVERIFY 0 PUSHANNEX 2 EQUALVERIFY TOALT PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIGVERIFY RESETSIGDATA FROMALT PUSHANNEX 1 EQUALVERIFY EXTRASIGDATA CHECKSIG

spent by having the following annex (eg): {0: [15, 1], 1: [alicepolicy], 15: [bobpolicy]}, where the novel opcodes behave as:

  • annexlength tells you the number of distinct tags in the annex
  • pushannex pops n of the stack, looks up annex entry n, and pushes each value from the annex with tag n onto the stack, followed by the count (possibly 0)
  • extrasigdata pops an element off the stack, hashes it, and will commit to the (cumulative) hash in subsequent checksig operations
  • resetsigdata resets that cumulative hash

So the "annexlength" check is used to prevent malleability, then the first "0 pushannex" will put [1 15 2] on the stack (2 at the top); the second pushannex will update the stack to [alicepolicy 1], extrasigdata will ensure alice's signature commits to "alicepolicy", the final pushannex will update the stack to [bobpolicy 1], etc. You'd also need a SIGHASH_NOANNEX or similar, of course.

Alice and Bob would still need to agree on the script that defines which subset of the annex they'll each commit to; currently that obviously has to be at the time they define their shared pubkey, but even with OP_EVAL or graftroot, while they could delay that agreement, they'd still need to do it sometime. You'd need a much more generic language to allow them to each choose which parts of the annex to sign at signing time.

* re-using the delta for both type and length might be unpractical as the accumulating the delta for the length might have no relation at all with the size of the data item.

They don't need to have any relation? If the previous element had type X, size N1, and this element has type X+K and size N2, you just encode (K, N2), as:

  • if N2 < 127: K*128 + N2
  • if N2 >= 127: (K*128 + 127), (N2-127)

ajtowns avatar Jan 17 '23 08:01 ajtowns

I thought the taproot upgrade introduced the annex field to allow for future protocol expansions without requiring further soft forks?

There is an interesting design open question - If we could have restrained soft-fork semantics introduced by economic or proof-of-work mechanism, or with expiring enforcement. There was such an idea presented on the mailing list a while back “Automatically reverting (“transitory”) soft forks" .

Can we design the taproot annex as a safe sandbox under some consensus boundaries ?

ariard avatar Jun 01 '23 00:06 ariard

We could consider using a prefix varint, where the number of leading 1s in the initial and subsequent bytes, until a 0 is reached, determines how many additional bytes follow. The only advantage is performance, since you don't have a potential branch on every byte, and you can load data bytes directly. I don't know if that's enough of an advantage to use a less-common varint encoding, but it's worth considering. Here's a good. Hacker News post about the encoding.

casey avatar Sep 25 '23 19:09 casey

Yes browsed over the hacker news post where a 128-bit prefix varint is argued to be dramatically faster to decode and encode. I think this is unclear if performance-over-space or space-over-performance should be favored (sounds classic time-space csci trade-offs), and what is the average annex payload that can be expected. Maybe performance gain is so cheap that it doesn’t matter to optimize to protect full-node CPU cycles, and favor cheap witness cost for annex users.

Note the annex policy-only discussion, where non-interactive annex composition among a set of multi-party users is weighted on.

ariard avatar Sep 30 '23 01:09 ariard

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

casey avatar Sep 30 '23 02:09 casey

I notice that there's no maximum varint size mentioned. Would it be a good idea to restrict varints to being no greater than one of {u32, u64, u128}::MAX? (Which one depending on how large varints are expected to be. This would simplify code that has to pass around varints, since they can use a fixed-size value, instead of having to use big ints.

Particularly when combined with (a) a "PUSHANNEX" opcode that grabs an entry from the annex and puts it on the stack, or (b) signing or otherwise accessing annex data from other inputs (see inq#19), it might make sense to make the annex much more limited. For example, only allowing individual data items to be 0-127 bytes, and limiting tags to be integers between, perhaps, 0 and 2**24-1.

In that case, rather than putting 100kB of data in the annex, you'd put the 100kB on the stack and use a script like "HASH160 0 PUSHANNEX EQUALVERIFY CHECKSIG" to achieve the same result; the benefit being that other inputs that also sign your annex are only signing an extra 20 byte hash160 hash, not the 100kB of data.

Doing things with those limits would let you encode annex entries as:

  • 1 byte - bump_len
  • 3 byte optional - tag_bump (present iff (bump_len & 0x80) != 0)
  • (bump_len & 0x7F) bytes - data

So if you wanted to encode {0: [<1234>], 1: [800000, <54a0>]} you'd do it as 50 02 1234 83 010000 00350c 02 54a0, which gets decoded as 50 -- annex prefix, 02 no tag_bump, 2 bytes of data, data is hex string 1234; 83 bump the tag, 3 bytes of data, tag is bumped by 0x000001 (little endian), data is 0x0c3400 or 800,000 (little-endian), 02 don't bump the tag, 2 bytes of data, data is hex string 54a0.

(My thinking is that this way you can define tag value 1 as per-input locktime, which accepts one or two data items, if there's one data item, it just requires that that block height has been reached; if there's two data items, it requires that block height has been reached and that block's hash matches the second data item)

If you make a bad encoding, either by not having enough data or bumping the tag to 2**24 or higher, that's "invalid", either causing the tx to be invalid, or causing the data to be inaccessible via PUSHANNEX. 2**24 is 16M different tags, which is surely more than enough; but perhaps 2**16 or even 2**8 would be fine?

ajtowns avatar Nov 17 '23 05:11 ajtowns

I’m more likely to work on the great consensus cleanup for the coming future. If someone wish to shepherd forward the taproot annex from here, feel free to ask AJ and/or me if you wish inputs as this current draft are gathering common ideas.

ariard avatar Apr 25 '24 07:04 ariard

@ariard would you mind closing this pull, if you don't currently plan to work on it?

jonatack avatar Apr 25 '24 14:04 jonatack

@jonatack somehow the annex is one of the best known way to fix technical debt in l2s: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-December/022198.html

when i say i don’t currently plan to work on it, it’s on the 2/3 years coming span of time.

meantime, i think it can be good to keep collecting feedbacks, or folks wanna discuss the implementation approach.

if you wish to say more on how we shall deal with BIP related to consensus changes, good.

reality they’re shepherd by a plurality of authors over very long span of time.

ariard avatar Apr 26 '24 23:04 ariard

Closing it, I did backup of the comments for my own archive. If someone wants to grab it, feel free to do it.

ariard avatar Apr 28 '24 21:04 ariard