go-ethereum core/vm: implement full EOF suite

(wip)

This PR builds on the Ipsilon team's (@gumb0, @chfast, and @axic) earlier work implementing EIP-3540 and EIP-3670. I'm extending it to support the "full" EOF suite which is:

[x] EIP-3540: EVM Object Format (EOF) v1
[x] EIP-3670: EOF - Code Validation
[x] EIP-4200: Static relative jumps
[x] EIP-4750: EOF - Functions
[x] EIP-5450: EOF - Stack Validation

Nov 07 '22 23:11 lightclient

@hugo-dc is working on EIP-4200 (static relative jumps) implementation: https://github.com/ipsilon/go-ethereum/tree/eip-4200.

Nov 08 '22 10:11 chfast

I wanted to do some cross-client fuzzing, using https://github.com/holiman/txparse/#eof-parser-eofparse as a fuzzing target. However, in order to invoke the parsing from an external package, the external caller needs an instruction set. But none of our instruction sets are public. I suggest somehow either

a) making the instructionsets public, or b) making the instructionset constructors public, or c) making a public eof parser method which internally is able to lookup/create an instruction set for given parameters, or d) making a constructor for the 'LatestTesting' instruction set.

Locally, I went with option d.

diff --git a/core/vm/jump_table.go b/core/vm/jump_table.go
index 120fb56ae2..1aa3fc4703 100644
--- a/core/vm/jump_table.go
+++ b/core/vm/jump_table.go
@@ -63,6 +63,12 @@ var (
 	shanghaiInstructionSet         = newShanghaiInstructionSet()
 )
 
+// NewLatestInstructionSetForTesting returns the 'latest' instruction set, for use in testing.
+// This method will change over time, and cannot be relied upon to stay consistent across releases.
+func NewLatestInstructionSetForTesting() JumpTable {
+	return newShanghaiInstructionSet()
+}
+
 // JumpTable contains the EVM opcodes supported at a given fork.
 type JumpTable [256]*operation

Nov 15 '22 09:11 holiman

Now that the masterbranch is go1.19 and up, we could also import this test into this PR: https://github.com/holiman/txparse/blob/main/eofparse/eofparse_test.go

Nov 15 '22 09:11 holiman

To generate ethereum tests with this implementation (with t8n tool), it seems we need either Shanghai definition in https://github.com/lightclient/go-ethereum/blob/ec36b2dd02ff45b842c56c8c36771027f62ceff7/tests/init.go#L28 or an activator for it as an "extra EIP" in https://github.com/lightclient/go-ethereum/blob/ec36b2dd02ff45b842c56c8c36771027f62ceff7/core/vm/eips.go#L28

Dec 22 '22 11:12 gumb0

IMO let's add it to shanghai definition

Dec 22 '22 11:12 holiman

EIP-3540:

If CREATE or CREATE2 instruction is executed in an EOF1 code the instruction’s initcode MUST be valid EOF1 code (i.e. EOF1 contracts MUST NOT produce legacy code).

I think this restriction is not yet implemented.

Dec 29 '22 12:12 gumb0

EIP-3540:

If CREATE or CREATE2 instruction is executed in an EOF1 code the instruction’s initcode MUST be valid EOF1 code (i.e. EOF1 contracts MUST NOT produce legacy code).

That's ambiguous. It is saying that valid EOF1 code === not legacy code. But what if we have EOF2 in the future? Would it be allowed for EOF1 to create EOF2?

Dec 29 '22 12:12 holiman

I upated eofparse to use this latest code. It found a crasher. Please check if I'm validating it correctly: https://github.com/holiman/txparse/blob/main/eofparse/main.go#L29 .

Here's the crasher:

$ echo "EF00010100020F0004006000AABBCCDD" | ./eofparse
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/vm.parseUint16(...)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:219
github.com/ethereum/go-ethereum/core/vm.parseList({0xc0000bc320, 0x44ffbe?, 0x20}, 0x471e1f?)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:211 +0x132
github.com/ethereum/go-ethereum/core/vm.parseSectionList(...)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:203
github.com/ethereum/go-ethereum/core/vm.(*Container).UnmarshalBinary(0xc00019e698, {0xc0000bc320, 0x10, 0x20})
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:123 +0x112
main.work()
        /home/user/go/src/github.com/holiman/txparse/eofparse/main.go:24 +0x152
main.main()
        /home/user/go/src/github.com/holiman/txparse/eofparse/main.go:13 +0x17

Dec 30 '22 06:12 holiman

Ah. Yeah your parsers are a bit too trustful.

Dec 30 '22 06:12 holiman

I pushed some potential fixes to https://github.com/holiman/go-ethereum/tree/eof_fixes . I did not handle the errors in some places, like validateControlFlow, because I suspect the lengths are already checked here. And if not, that I would catch it with some fuzzing.

Dec 30 '22 07:12 holiman

With my changes to the parser, I now hit a different crasher in the analysis:

$ echo "ef000101000c020003000300050001030000000000000130300030303000306030005e3030303030" | ./eofparse 
panic: runtime error: index out of range [5] with length 5

goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/vm.bitvec.set16(...)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/analysis.go:54
github.com/ethereum/go-ethereum/core/vm.eofCodeBitmapInternal({0xc0000da4d2?, 0x3?, 0x31?}, {0xc000146890?, 0xc0000c1de0?, 0x3?})
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/analysis.go:154 +0x4b1
github.com/ethereum/go-ethereum/core/vm.eofCodeBitmap(...)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/analysis.go:126
github.com/ethereum/go-ethereum/core/vm.validateCode({0xc0000da4d2, 0x5, 0x2e}, 0x1, {0xc0000c1de0?, 0x3, 0x4}, 0xc00019ef60)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/validate.go:59 +0x16d
github.com/ethereum/go-ethereum/core/vm.(*Container).ValidateCode(0xc00019e698, 0xc0000da4b0?)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:198 +0x93
main.work()
        /home/user/go/src/github.com/holiman/txparse/eofparse/main.go:28 +0x1c5
main.main()
        /home/user/go/src/github.com/holiman/txparse/eofparse/main.go:13 +0x17

Dec 30 '22 08:12 holiman

It would probably be a good idea to lift this test into this PR: https://github.com/holiman/txparse/blob/main/eofparse/eofparse_test.go . We haven't switched over all fuzzing tests to 'native' format yet, but that shouldn't prevent us from making new fuzzers in the new style.

Dec 30 '22 08:12 holiman

This diff would "fix" it, but perhaps not the optimal way to do it:

diff --git a/core/vm/analysis.go b/core/vm/analysis.go
index b96a493522..9eb00a1627 100644
--- a/core/vm/analysis.go
+++ b/core/vm/analysis.go
@@ -16,6 +16,8 @@
 
 package vm
 
+import "errors"
+
 const (
 	set2BitsMask = uint16(0b11)
 	set3BitsMask = uint16(0b111)
@@ -118,7 +120,7 @@ func codeBitmapInternal(code, bits bitvec) bitvec {
 }
 
 // eofCodeBitmap collects data locations in code.
-func eofCodeBitmap(code []byte) bitvec {
+func eofCodeBitmap(code []byte) (bitvec, error) {
 	// The bitmap is 4 bytes longer than necessary, in case the code
 	// ends with a PUSH32, the algorithm will push zeroes onto the
 	// bitvector outside the bounds of the actual code.
@@ -128,7 +130,7 @@ func eofCodeBitmap(code []byte) bitvec {
 
 // eofCodeBitmapInternal is the internal implementation of codeBitmap for EOF
 // code validation.
-func eofCodeBitmapInternal(code, bits bitvec) bitvec {
+func eofCodeBitmapInternal(code, bits bitvec) (bitvec, error) {
 	for pc := uint64(0); pc < uint64(len(code)); {
 		op := OpCode(code[pc])
 		pc++
@@ -149,6 +151,9 @@ func eofCodeBitmapInternal(code, bits bitvec) bitvec {
 			// If not PUSH (the int8(op) > int(PUSH32) is always false).
 			continue
 		}
+		if uint64(numbits)+pc >= uint64(len(code)) {
+			return nil, errors.New("invalid relative jump")
+		}
 		if numbits >= 8 {
 			for ; numbits >= 16; numbits -= 16 {
 				bits.set16(pc)
@@ -183,5 +188,5 @@ func eofCodeBitmapInternal(code, bits bitvec) bitvec {
 			pc += 7
 		}
 	}
-	return bits
+	return bits, nil
 }
diff --git a/core/vm/analysis_test.go b/core/vm/analysis_test.go
index a50b360e5e..b955d42d3f 100644
--- a/core/vm/analysis_test.go
+++ b/core/vm/analysis_test.go
@@ -57,7 +57,7 @@ func TestJumpDestAnalysis(t *testing.T) {
 		if ret[test.which] != test.exp {
 			t.Fatalf("test %d: expected %x, got %02x", i, test.exp, ret[test.which])
 		}
-		ret = eofCodeBitmap(test.code)
+		ret, _ = eofCodeBitmap(test.code)
 		if ret[test.which] != test.exp {
 			t.Fatalf("eof test %d: expected %x, got %02x", i, test.exp, ret[test.which])
 		}
@@ -75,7 +75,7 @@ func TestEOFAnalysis(t *testing.T) {
 		{[]byte{byte(RJUMPV), 0x02, byte(RJUMP), 0x00, byte(RJUMPI), 0x00}, 0b0011_1110, 0},
 	}
 	for i, test := range tests {
-		ret := eofCodeBitmap(test.code)
+		ret, _ := eofCodeBitmap(test.code)
 		if ret[test.which] != test.exp {
 			t.Fatalf("test %d: expected %x, got %02x", i, test.exp, ret[test.which])
 		}
diff --git a/core/vm/validate.go b/core/vm/validate.go
index f62750e232..1df9e09081 100644
--- a/core/vm/validate.go
+++ b/core/vm/validate.go
@@ -45,7 +45,10 @@ func validateCode(code []byte, section int, metadata []*FunctionMetadata, jt *Ju
 			i += size
 		case op == RJUMP || op == RJUMPI:
 			if analysis == nil {
-				tmp := eofCodeBitmap(code)
+				tmp, err := eofCodeBitmap(code)
+				if err != nil {
+					return err
+				}
 				analysis = &tmp
 			}
 			// Verify that the relative jump offset points to a
@@ -56,8 +59,11 @@ func validateCode(code []byte, section int, metadata []*FunctionMetadata, jt *Ju
 			i += 2
 		case op == RJUMPV:
 			if analysis == nil {
-				tmp := eofCodeBitmap(code)
-				analysis = &tmp
+				tmp, err := eofCodeBitmap(code)
+				if err != nil {
+					return err
+					analysis = &tmp
+				}
 			}
 			// Verify each branch in the jump table points to a
 			// destination in-bounds.

... but it fixed it sufficiently so that I can fuzz it a bit further

Dec 31 '22 08:12 holiman

New crasher:

$ echo "0xef000101000c020003000300050001030000000000000130300030303000306030005e8030303030" | go run ./main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x56cb98]

goroutine 1 [running]:
github.com/ethereum/go-ethereum/core/vm.validateCode({0xc00014a482, 0x5, 0x2e}, 0x1, {0xc000133de0?, 0x3, 0x4}, 0xc00021ef60)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/validate.go:78 +0x858
github.com/ethereum/go-ethereum/core/vm.(*Container).ValidateCode(0xc00021e698, 0xc00014a460?)
        /home/user/go/src/github.com/ethereum/go-ethereum/core/vm/eof.go:198 +0x93
main.work()
        /home/user/go/src/github.com/holiman/txparse/eofparse/main.go:28 +0x1c5
main.main()

Dec 31 '22 09:12 holiman

I think I broke this. I tried to create code with this EOF1 code: ef000101000802000200040001030001000000000000000000b0000100b1ff

And it failed with:

Error: ERROR OCCURED FILLING TESTS: DebugVMTrace parse error: Error in DataObject: 
 key: '' type: 'object'
 assert: count(_key) _key=time (DataObject::atKey)
{
    "output" : "ef000101000802000200040001030001000000000000000000b0000100b1ff",
    "gasUsed" : "0x4c3e31c",
    "error" : "invalid eof: unreachable code"
}
 (stEOF/ori/EOF1ValidInvalid, fork: Shanghai, TrInfo: d: 35, g: 0, v: 0)

But I don't see any unreachable code:

ef0001 - magic and version 010008 - two code sections 020002 - two code sections 0004 - code section 0, 4 bytes 0001 - code section 1, 1 byte 030001 - data section length: 1 00 - end of header 00000000 - no inputs, no outputs, no stack 00000000 - no inputs, no outputs, no stack code section 0: b0000100 - CALLF code section 1 code section 1: b1 - RETF ff - data section

Jan 02 '23 14:01 qbzzt

Wrong error message:

Error: ERROR OCCURED FILLING TESTS: DebugVMTrace parse error: Error in DataObject: 
 key: '' type: 'object'
 assert: count(_key) _key=time (DataObject::atKey)
{
    "output" : "ef00010100040200010005030001000000000130b0000000ff",
    "gasUsed" : "0x4c3e340",
    "error" : "invalid eof: stack underflow"
}

But this is a data stack overflow

ef0001 - magic and version 010004 - 1 code section 020001 - 1 code section 0005 - sec0: 5 bytes 030001 - data section: 1 byte 00 - end of header sec0: 00 00 0001 - no inputs, no outputs, datastack of one sec0: 30 - ADDRESS (push one value) b00000 - CALLF 0 00 - STOP data section: ff

Jan 02 '23 14:01 qbzzt

Thanks for the comments @qbzzt. I resolved the first issue in 1650a733bfcf313291a30724cfaf7af55f0da72e. I think there is a problem in the second issue, but I think the code should actually pass. I had the comparison operator flipped which caused it to think it was a stack underflow. Can you elaborate why you think the error should be stack overflow rather than no error?

Jan 02 '23 17:01 lightclient

Looks valid to me too, stack overflow will not be detected at validation time, it only will fail at runtime.

Jan 02 '23 19:01 gumb0

Looks valid to me too, stack overflow will not be detected at validation time, it only will fail at runtime.

You're right, my mistake.

Jan 02 '23 20:01 qbzzt

"error" : "invalid code: legacy initcode may not deploy EOF code"

Where is this specified? https://eips.ethereum.org/EIPS/eip-3540#changes-to-contract-creation-semantics says that EOF initcode can't deploy legacy code, but not the reverse.

Jan 02 '23 20:01 qbzzt

@qbzzt I misinterpreted one of the tests in the ethereum/tests PR - removing the requirement.

Jan 02 '23 22:01 lightclient

@qbzzt I misinterpreted one of the tests in the ethereum/tests PR - removing the requirement.

Thank you. I could rewrite my tester to use an EOF1 deployer, but I'd rather not spend time on that today.

Jan 02 '23 22:01 qbzzt

@qbzzt I misinterpreted one of the tests in the ethereum/tests PR - removing the requirement.

Thank you, removing this now is saving me time.

Jan 03 '23 00:01 qbzzt

Bug report. My code fails with stack underflow:

Error: ERROR OCCURED FILLING TESTS: DebugVMTrace parse error: Error in DataObject: 
 key: '' type: 'object'
 assert: count(_key) _key=time (DataObject::atKey)
{
    "output" : "ef000101000802000200050003030001000000000100010001b00001500060ffb1ff",
    "gasUsed" : "0x4c3e2c8",
    "error" : "invalid eof: stack underflow"
}

And I don't understand why.

ef0001 - magic and version 010008 - two code sections 020002 - two code sections 0005 - section 0 is 5 bytes 0003 - section 1 is 3 bytes 030001 - data section is one byte 00 - end of header

types

section 0

0000 - no inputs, no outputs (section 0, so that's a requirement) 0001 - max stack height 1

section 1

0001 - no inputs, one return values 0001 - max stack height 1

code

section 0

B00001 - CALLF section 1 (which leaves an output on the stack) 50 - POP the value from section 1 00 - STOP

section 1

60FF - PUSH1 0xFF B1 - RETF

data

ff

Jan 03 '23 00:01 qbzzt

@qbzzt

$ echo "ef000101000802000200050003030001000000000100010001b00001500060ffb1ff" | /home/user/workspace/besu/ethereum/evmtool/build/install/evmtool/bin/evm code-validate
OK b000015000,60ffb1
$ echo "ef000101000802000200050003030001000000000100010001b00001500060ffb1ff" | /home/user/workspace/EOF-Header-Parser/bin/release/net6.0/linux-x64/EofParser
OK b000015000,60ffb1
user@debian-work:~/workspace$ echo "ef000101000802000200050003030001000000000100010001b00001500060ffb1ff" | /home/user/workspace/txparse/eofparse/eofparse
OK b000015000,60ffb1

You're right, all agree that your example is valid (even geth)

Jan 04 '23 07:01 holiman

This PR is really stale at this point, I think when we need to implement EOF we can use this for inspiration, but will need to do some significant refactoring.

Jan 26 '24 16:01 lightclient

go-ethereum go-ethereum copied to clipboard

core/vm: implement full EOF suite

types

section 0

section 1

code

section 0

section 1

data

go-ethereum
go-ethereum copied to clipboard