chaingraph
chaingraph copied to clipboard
Alternative bytecode_pattern approach
So I was thinking to split how to store redeem script, how about split it to 3 fields:
- redeem script pattern, computed by replacing each sequence of pushes with number of pushes, encoded as script number push
- sequence of push sizes: just the sequence of push sizes encoded as script number pushes
- sequence of pushes: just the pushes
One can then use _eq operator on the most general pattern, which should be better performance than regex, it would then be further narrowed down by using regex on push sizes or pushes, but those would be executed only on positive matches for the general template. Also, the redeem script can be accurately reconstructed from this.
Could even do some more parsing and have a function to filter for the exact value of Nth push or something.
I made a little tool to experiment with this, using these modes:
- STRIP_PUSHES - replace each succesive sequence of pushes with number of pushes, encoded as a script number
- STRIP_PUSH_DATA - replace each push with payload size, encoded as a script number
- EXTRACT_PUSHES - ignore all executable bytes, extract full pushes
Example of patternizing AnyHedge input script:
STRIP_PUSHES: 56; len=1
STRIP_PUSH_DATA: 01406001406051025401; len=10
EXTRACT_PUSHES: 40da2963cc172e7dccf9570ebd272c496d9df459f1b4d07f1961515642054db764f25c4aab947a4dbcf7793ca25bcc5a46faa83d93e7aeaa2e23a95376e386902d10020aa262ab330100863301002b45000040c0df593545220c40b8676d56388b58715b27cc0fc5de3640dfd49aa05d00e3efc5597b9d8dd783f055b0579630157599290d547e9e3c160d2170d2c300177a72103e0aa262ac3301008733010026450000514d5401043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=508
Entering the redeem script, we get:
BYTECODE: 043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=340
STRIP_PUSHES: 5d79519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695176517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=156
STRIP_PUSH_DATA: 54545352535501210119011951012101215179519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695276517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=173
EXTRACT_PUSHES: 043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c005c5b515c51515f5f575d5d575c5854005c58545c545b5b5c0059585c545c5b5a5b0222025853575252000055515154; len=231
To better illustrate, here's an index (pattern, input_count) of contract fingerprints (STRIP_PUSHES mode) from blocks 0-780,000:
https://gist.github.com/A60AB5450353F40E/6b3e525d6e1220328217b9568968d6fc
Thanks for looking into this @A60AB5450353F40E!
This would be a great improvement for scanning contract patterns. I'd love to take a PR introducing this feature! I won't have bandwidth to work on this myself until I make some progress on https://github.com/bitauth/chaingraph/issues/29. (Otherwise, I'll try to implement the bytecode_pattern stuff this way when I'm working on the ClickHouse migration.)