bips icon indicating copy to clipboard operation
bips copied to clipboard

BIP for reenabling OP_CAT

Open EthanHeilman opened this issue 8 months ago • 12 comments

This BIP defines OP_CAT a new tapscript opcode which allows the concatenation of two values on the stack. This opcode would be activated via a soft fork by redefining the opcode OP_SUCCESS126.

See our implementation PR in bitcoin-inquisition: https://github.com/bitcoin-inquisition/bitcoin/pull/39

EthanHeilman avatar Dec 11 '23 23:12 EthanHeilman

Definitely looking forward to test drive this BIP.

Bloc6 avatar Dec 12 '23 23:12 Bloc6

Can we get a BIP number assigned? Any blockers to doing this?

EthanHeilman avatar Dec 14 '23 18:12 EthanHeilman

@kallewoof The BIP file is still name bip-???-cat.mediawiki. How do we go about numbering this BIP?

0xBEEFCAF3 avatar Dec 18 '23 20:12 0xBEEFCAF3

@kallewoof The BIP file is still name bip-???-cat.mediawiki. How do we go about numbering this BIP?

Once a BP number has been assigned, the authors are expected to fill that in in the right places.

kallewoof avatar Dec 19 '23 01:12 kallewoof

@kallewoof The BIP file is still name bip-???-cat.mediawiki. How do we go about numbering this BIP?

Once a BP number has been assigned, the authors are expected to fill that in in the right places.

Thanks for the reply. Of course, we'll change and fill the bip number wherever it's needed. Has this BIP already been assigned a number? I might have missed that comment

0xBEEFCAF3 avatar Dec 19 '23 19:12 0xBEEFCAF3

Missing a section for backward compatibility

@luke-jr Added!

0xBEEFCAF3 avatar Dec 29 '23 18:12 0xBEEFCAF3

@luke-jr We are requesting a BIP number for this PR

EthanHeilman avatar Jan 19 '24 03:01 EthanHeilman

Any updates on this, would love to see this BIP merged to make verification of FRI based zero knowledge proofs (pay to zkp) more efficient

cf avatar Mar 02 '24 02:03 cf

I would like to propose to replace the per-item size limit with a total stack size limit that is equivalent and can be introduced as a softfork as well. Since we currently allow 1000 items of 520 bytes, I would propose a 520kB (520,000 bytes) size limit for the entire stack + altstack.

This can be implemented as such:

diff --git a/src/script/interpreter.cpp b/src/script/interpreter.cpp
index 1583b5fadd3..cbe1695bab2 100644
--- a/src/script/interpreter.cpp
+++ b/src/script/interpreter.cpp
@@ -545,8 +545,6 @@ bool EvalScript(std::vector<std::vector<unsigned char> >& stack, const CScript&
                         return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
                     valtype& vch1 = stacktop(-2);
                     valtype& vch2 = stacktop(-1);
-                    if (vch1.size() + vch2.size() > MAX_SCRIPT_ELEMENT_SIZE)
-                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
                     vch1.insert(vch1.end(), vch2.begin(), vch2.end());
                     stack.pop_back();
                 }
@@ -1287,8 +1285,17 @@ bool EvalScript(std::vector<std::vector<unsigned char> >& stack, const CScript&
             }

             // Size limits
+            // limit number of elements in stacks
             if (stack.size() + altstack.size() > MAX_STACK_SIZE)
                 return set_error(serror, SCRIPT_ERR_STACK_SIZE);
+            // limit total number of bytes in stacks
+            unsigned int total_bytes = 0;
+            for (size_t i = 0; i < stack.size() ; i++)
+                total_bytes += stack[i].size();
+            for (size_t i = 0; i < altstack.size() ; i++)
+                total_bytes += altstack[i].size();
+            if (total_bytes > MAX_STACK_BYTES)
+                return set_error(serror, SCRIPT_ERR_STACK_SIZE);
         }
     }
     catch (...)
diff --git a/src/script/script.h b/src/script/script.h
index 305919f5ba4..ef40c3672ee 100644
--- a/src/script/script.h
+++ b/src/script/script.h
@@ -38,6 +38,9 @@ static const int MAX_SCRIPT_SIZE = 10000;
 // Maximum number of values on script interpreter stack
 static const int MAX_STACK_SIZE = 1000;

+// Maximum number of total bytes on script interpreter stack
+static const int MAX_STACK_BYTES = 520000; // 520 * 1000
+
 // Threshold for nLockTime: below this value it is interpreted as block number,
 // otherwise as UNIX timestamp.
 static const unsigned int LOCKTIME_THRESHOLD = 500000000; // Tue Nov  5 00:53:20 1985 UTC

stevenroose avatar Mar 05 '24 17:03 stevenroose

NACK

OP_CAT enables recursion which is computational equivalent to iteration; thus making Script Turing complete.

ChrisMartl avatar Mar 17 '24 15:03 ChrisMartl

@ChrisMartl How does OP_CAT add recursion? What's your source for this claim?

EthanHeilman avatar Mar 17 '24 22:03 EthanHeilman

In case this claim originates with something I've said publicly: I am confident that CAT does not introduce recursion to Script and I have never (intentionally) said that it did.

CAT does introduce a form of recursive covenants. But the form of "recursion" here is one that involves a fresh transaction for every iteration, and does not change the computational model at all.

apoelstra avatar Mar 17 '24 22:03 apoelstra

To stevenroose's proposal. https://github.com/bitcoin/bips/pull/1525#issuecomment-1979297869

I would like to propose to replace the per-item size limit with a total stack size limit that is equivalent and can be introduced as a softfork as well. Since we currently allow 1000 items of 520 bytes, I would propose a 520kB (520,000 bytes) size limit for the entire stack + altstack.

This can be implemented as such:

diff --git a/src/script/interpreter.cpp b/src/script/interpreter.cpp
index 1583b5fadd3..cbe1695bab2 100644
--- a/src/script/interpreter.cpp
+++ b/src/script/interpreter.cpp
@@ -545,8 +545,6 @@ bool EvalScript(std::vector<std::vector<unsigned char> >& stack, const CScript&
                         return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
                     valtype& vch1 = stacktop(-2);
                     valtype& vch2 = stacktop(-1);
-                    if (vch1.size() + vch2.size() > MAX_SCRIPT_ELEMENT_SIZE)
-                        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
                     vch1.insert(vch1.end(), vch2.begin(), vch2.end());
                     stack.pop_back();
                 }
@@ -1287,8 +1285,17 @@ bool EvalScript(std::vector<std::vector<unsigned char> >& stack, const CScript&
             }

             // Size limits
+            // limit number of elements in stacks
             if (stack.size() + altstack.size() > MAX_STACK_SIZE)
                 return set_error(serror, SCRIPT_ERR_STACK_SIZE);
+            // limit total number of bytes in stacks
+            unsigned int total_bytes = 0;
+            for (size_t i = 0; i < stack.size() ; i++)
+                total_bytes += stack[i].size();
+            for (size_t i = 0; i < altstack.size() ; i++)
+                total_bytes += altstack[i].size();
+            if (total_bytes > MAX_STACK_BYTES)
+                return set_error(serror, SCRIPT_ERR_STACK_SIZE);
         }
     }
     catch (...)
diff --git a/src/script/script.h b/src/script/script.h
index 305919f5ba4..ef40c3672ee 100644
--- a/src/script/script.h
+++ b/src/script/script.h
@@ -38,6 +38,9 @@ static const int MAX_SCRIPT_SIZE = 10000;
 // Maximum number of values on script interpreter stack
 static const int MAX_STACK_SIZE = 1000;

+// Maximum number of total bytes on script interpreter stack
+static const int MAX_STACK_BYTES = 520000; // 520 * 1000
+
 // Threshold for nLockTime: below this value it is interpreted as block number,
 // otherwise as UNIX timestamp.
 static const unsigned int LOCKTIME_THRESHOLD = 500000000; // Tue Nov  5 00:53:20 1985 UTC

This seems to require checking the size limit after every opcode is computed, and it might not be scalable. I think this may need to be left to a different PR. Calculating the delta and only updating the delta is possible, but it would need to change the implementation a lot.

weikengchen avatar Apr 03 '24 19:04 weikengchen

I would like to propose to replace the per-item size limit with a total stack size limit that is equivalent and can be introduced as a softfork as well. Since we currently allow 1000 items of 520 bytes, I would propose a 520kB (520,000 bytes) size limit for the entire stack + altstack.

Consider the following script:

"1234567" DUP CAT 8 CAT
DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT
DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT DUP CAT

1 SHA256 OVER CAT SHA256
[ OVER CAT SHA256 ] * 130000
CAT SHA256

It constructs a ~246kB string on the stack consisting of lots of repeats of "123456712345678", then repeatedly concatenates that string to a sha256 hash, using the hash of the result as the new sha256 hash. Doing this 130k times means you're hashing about 32GB of data despite never having more than about 492kB on the stack in total, while still fitting in a standard tx with less than 400k weight units. For comparsion, constraining each stack element to being 520 bytes limits the total amount hashed to under 208MB for any standard transaction.

ajtowns avatar Apr 04 '24 04:04 ajtowns

Assigned BIP 347.

Roasbeef avatar Apr 24 '24 17:04 Roasbeef

Does this mean we can also formally propose OP_{LEFT,RIGHT,SUBSTR} now too? :heart_eyes:

delbonis avatar Apr 24 '24 18:04 delbonis

@delbonis anybody can propose anything formally, by running through the process of creating a proposal, soliciting input on the mailing list, etc.

But the three opcodes you mention can be fairly-cheaply emulated using CAT, and have fewer direct usecases, so it may not be a productive direction to pursue.

apoelstra avatar Apr 24 '24 18:04 apoelstra

LEFT/RIGHT/SUBSTR/SPLIT etc are actually a bit nicer in some ways than CAT because they only create same size or smaller elements on the stack, forgoing the concerns of max element size / working seamlessly if the element size should ever be changed to allow larger.

JeremyRubin avatar Apr 24 '24 21:04 JeremyRubin

Please keep comments focused on technical review of the proposal -- thanks.

jonatack avatar Apr 24 '24 22:04 jonatack

@delbonis @JeremyRubin Re: OP_LEFT/RIGHT/SUBSTR

There are still values in OP_LEFT/RIGHT/SUBSTR. To emulate them with OP_CAT, one needs to provide a hint, and this takes space in the witness stack / input stack. This is inconvenient because the witness stack can have at most 1000 elements.

In our practice with Merkle trees for STARK proof verifier that uses OP_CAT, for each layer, we need to provide a sibling element. A standard transaction only prefers binary Merkle tree. And therefore, there are log_{2} n elements that we need to push in the witness stack. (nonstandard transaction is an easier situation, which allows log_{16} n) OP_LEFT/RIGHT/SUBSTR, if together with a nonstandard transaction, can allow one to pack elements compactly to save stack space.

Here is the related codebase: https://github.com/Bitcoin-Wildlife-Sanctuary/bitcoin-circle-stark?tab=readme-ov-file#performance

But I think people can agree that bringing back OP_SUBSTR is enough. This allows us to save other successful opcodes for other purposes.

That said, this is not related to the OP_CAT BIP discussion, and may be better discussed in other venues.

weikengchen avatar Apr 25 '24 01:04 weikengchen

Could you point me toward where i can learn more about the stack size limit referred? I'm not technically savvy enough to catch how the 520 bytes stack limit is imposed on OP_CAT operation when there are no mention of the rationale in the commit. Is it enforced by tapscript? @EthanHeilman thank you

OminousPeace avatar Apr 26 '24 12:04 OminousPeace

Thanks for your question!

Could you point me toward where i can learn more about the stack size limit referred? @OminousPeace

In the OP_CAT reference implementation in BIP-347:

case OP_CAT:
{
 if (stack.size() < 2)
   return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);
 valtype& vch1 = stacktop(-2);
 valtype& vch2 = stacktop(-1);
 if (vch1.size() + vch2.size() > MAX_SCRIPT_ELEMENT_SIZE)
   return set_error(serror, SCRIPT_ERR_PUSH_SIZE);
 vch1.insert(vch1.end(), vch2.begin(), vch2.end());
 stack.pop_back();
}
break;

We check that their are sufficient elements on the stack:

  if (stack.size() < 2)
   return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);

Then we check that the CAT operation will not create a stack element larger than the max stack element:

  if (vch1.size() + vch2.size() > MAX_SCRIPT_ELEMENT_SIZE)
    return set_error(serror, SCRIPT_ERR_PUSH_SIZE);

MAX_SCRIPT_ELEMENT_SIZE is defined as 520 (bytes) in bitcoin-core.

// Maximum number of bytes pushable to the stack
static const unsigned int MAX_SCRIPT_ELEMENT_SIZE = 520;

OP_CAT is neutral with regard to the total size of stack because it joins two elements together which were already on the stack. That is, an OP_CAT operation should never increase or decrease the total size of stack in bytes.

For running code see our implementation of OP_CAT in bitcoin-inquisition.

Is it enforced by tapscript?

Both script and tapscript enforce the element stack size limit and the total stack size limit when a OP_PUSH occurs. OP_CAT like any other opcode which pushes data on the stack, must also ensure that it does not violate this limit, which is why returns an error if it would push an element greater than the maximum stack element size (520).

Rationale

We do provide a rationale for why we don't alter bitcoin consensus to increase the stack element size limit in the rationale section of the BIP.

Did that answer your question?

EthanHeilman avatar Apr 26 '24 14:04 EthanHeilman

I put the BIP number in the title, I hope you don’t mind. It seems to me that there are still a couple open review comments, will swing by again later to check whether it’s ready for the next round of review.

murchandamus avatar Apr 30 '24 14:04 murchandamus

@luke-jr had previously requested changes:

Missing a section for backward compatibility

which I consider to have been addressed. I will abstain from merging this PR for some time, to give other editors a chance to make their own assessment.

murchandamus avatar May 01 '24 17:05 murchandamus

It seems useful to use the ACM DOI URI for that paper mentioned above with the nonfunctional link: https://dl.acm.org/doi/10.1145/2810103.2813686

It is very likely to still work for ten more years. Although it is paywalled, people can find free versions online.

weikengchen avatar May 02 '24 00:05 weikengchen

The build check was failing, because the BIP had not been added to the table in the README yet, so I added a commit to add it to the table.

murchandamus avatar May 03 '24 02:05 murchandamus

It seems useful to use the ACM DOI URI for that paper mentioned above with the nonfunctional link: dl.acm.org/doi/10.1145/2810103.2813686

It is very likely to still work for ten more years. Although it is paywalled, people can find free versions online.

Yep, sorry, we weren't allowed to upload it to eprint.iacr.org or arXiv due to copyright reasons. (The ACM is evil...)

But it's lawfully available under https://publications.cispa.saarland/565/1/penalizing.pdf (Web Archive: https://web.archive.org/web/20221023121048/https://publications.cispa.saarland/565/1/penalizing.pdf), and I've also just made it available under https://raw.githubusercontent.com/real-or-random/accas/master/paper.pdf (Web Archive: https://web.archive.org/web/20240506104315/https://raw.githubusercontent.com/real-or-random/accas/master/paper.pdf).

real-or-random avatar May 06 '24 10:05 real-or-random

@real-or-random Just added a stable URL for Liar, Liar, Coins on Fire! citation.

EthanHeilman avatar May 06 '24 17:05 EthanHeilman