chaingraph icon indicating copy to clipboard operation
chaingraph copied to clipboard

Alternative bytecode_pattern approach

Open A60AB5450353F40E opened this issue 1 year ago • 3 comments

So I was thinking to split how to store redeem script, how about split it to 3 fields:

  • redeem script pattern, computed by replacing each sequence of pushes with number of pushes, encoded as script number push
  • sequence of push sizes: just the sequence of push sizes encoded as script number pushes
  • sequence of pushes: just the pushes

One can then use _eq operator on the most general pattern, which should be better performance than regex, it would then be further narrowed down by using regex on push sizes or pushes, but those would be executed only on positive matches for the general template. Also, the redeem script can be accurately reconstructed from this.

Could even do some more parsing and have a function to filter for the exact value of Nth push or something.

A60AB5450353F40E avatar Mar 10 '23 13:03 A60AB5450353F40E

I made a little tool to experiment with this, using these modes:

  • STRIP_PUSHES - replace each succesive sequence of pushes with number of pushes, encoded as a script number
  • STRIP_PUSH_DATA - replace each push with payload size, encoded as a script number
  • EXTRACT_PUSHES - ignore all executable bytes, extract full pushes

Example of patternizing AnyHedge input script:

STRIP_PUSHES:    56; len=1
STRIP_PUSH_DATA: 01406001406051025401; len=10
EXTRACT_PUSHES:  40da2963cc172e7dccf9570ebd272c496d9df459f1b4d07f1961515642054db764f25c4aab947a4dbcf7793ca25bcc5a46faa83d93e7aeaa2e23a95376e386902d10020aa262ab330100863301002b45000040c0df593545220c40b8676d56388b58715b27cc0fc5de3640dfd49aa05d00e3efc5597b9d8dd783f055b0579630157599290d547e9e3c160d2170d2c300177a72103e0aa262ac3301008733010026450000514d5401043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=508

Entering the redeem script, we get:

BYTECODE:        043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=340
STRIP_PUSHES:    5d79519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695176517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=156
STRIP_PUSH_DATA: 54545352535501210119011951012101215179519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695276517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=173
EXTRACT_PUSHES:  043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c005c5b515c51515f5f575d5d575c5854005c58545c545b5b5c0059585c545c5b5a5b0222025853575252000055515154; len=231

A60AB5450353F40E avatar Mar 14 '23 11:03 A60AB5450353F40E

To better illustrate, here's an index (pattern, input_count) of contract fingerprints (STRIP_PUSHES mode) from blocks 0-780,000:

https://gist.github.com/A60AB5450353F40E/6b3e525d6e1220328217b9568968d6fc

A60AB5450353F40E avatar Mar 15 '23 10:03 A60AB5450353F40E

Thanks for looking into this @A60AB5450353F40E!

This would be a great improvement for scanning contract patterns. I'd love to take a PR introducing this feature! I won't have bandwidth to work on this myself until I make some progress on https://github.com/bitauth/chaingraph/issues/29. (Otherwise, I'll try to implement the bytecode_pattern stuff this way when I'm working on the ClickHouse migration.)

bitjson avatar Nov 21 '23 19:11 bitjson