etk icon indicating copy to clipboard operation
etk copied to clipboard

Proposal: Hardfork selection via flag

Open gzanitti opened this issue 2 years ago • 10 comments

This PR adds the capability to utilize the operations specific to each hardfork in ETK.

By employing a flag (e.g., --hardfork cancun), users can designate the hardfork to be used during the parsing and assembly processes. A primary benefit of this enhancement is its streamlined approach to managing compatibility across various EVM versions. This update allows for more straightforward handling of operation deprecation and aliasing.

The core alteration involves encapsulating the operations specific to each Hardfork within an enum responsible for managing different versions.

This revision builds upon PR #133 . However, it can be modified if PR 133 is not approved. Still in draft while I tidy up the code and review and adapt the code beyond the assembler.

I appreciate your insights and feedback.

gzanitti avatar Oct 20 '23 13:10 gzanitti

I haven't looked at the PR yet, so consider this feedback on the stated design.

Should this be a command-line option, something embedded in the file itself, or maybe different eas binaries altogether?

Auto-detect

Is it possible to infer the fork with some simple rule, like the oldest fork that supports explicitly used instructions?

Benefits

  • Would probably Just Work™ for simple programs.

Drawbacks

  • If you wanted %push to use a push0, you'd need to use a push0 elsewhere in your code to activate it.
  • No warnings if you're trying to target an older EVM (perhaps an L2) and use a newer instruction.
  • Too magical.

Command-Line Option

eas --hardfork cancun ...

Benefits

  • Unambiguous: the whole execution of eas uses one hardfork.
  • Simple and explicit.
  • It's how gcc works (eg. -std=c99.)

Drawbacks

  • Requires the person running the command know what the source is supposed to be targeting.
  • Unlike gcc, eas considers the whole program at once, instead of individual files. Different files wouldn't be able to have different hardforks.

Instruction Macro-like

%hardfork("berlin")

Benefits

  • The hardfork choice is closest to where it matters.
  • Lets you mix different forks in the same program (eg. a hypothetical old library could use suicide while the newer main program could use selfdestruct.)
  • Person doing the compiling doesn't need to know what fork to target.

Drawbacks

  • Lot of questions. What happens if %hardfork is in the middle of a file? In an %import? In an %include? In a macro?
  • Abuses the same syntax (% followed by identifier) for multiple different purposes: configuration and macros.

I think auto-detecting is out of the question. It goes against the "no surprises" principle I want to uphold with ETK.

Something in the file is likely my preferred option, but I think I've talked myself out of using the %hardfork("...") syntax.

So yeah, I think command-line flag is fine. What are your thoughts?

SamWilsn avatar Oct 22 '23 03:10 SamWilsn

I don't like the option to auto-detect the hardfork either, I think it would bring more problems than solutions.

Personally, I hadn't thought about the idea that it could be defined by a macro and it sounds really interesting. However, I have doubts about the possible use cases: As I understand it, all code, no matter how it was compiled, ends up being executed against a single version of the EVM. In case the changes are simple renames, it makes sense. But if you change the semantics of the opcodes (as for example with the EIP-4750 that changes JUMPDEST to NOP) I think this idea doesn't make sense anymore. But maybe I'm missing something.

gzanitti avatar Oct 22 '23 22:10 gzanitti

EIP-4750 actually brings up a good argument in favour of per-file hardforks, or at least supported version ranges.

Say I write a library that defines several EIP-4750 subroutines. As I understand EIP-4750, there is no jumpdest marking a valid jump destination, so one of my subroutines might look like this:

bail:   push1 0
        push1 0
        revert

If eas is invoked with a fork including EIP-4750, this works great. You can callf bail no problem.

On the other hand, if eas is invoked with an older fork, you'd get an exceptional halt if you push32 bail; jump.

If we have per-file annotations, eas could warn in this situation.

SamWilsn avatar Oct 23 '23 01:10 SamWilsn

But wouldn't that be a problem in the opposite situation?

Let's say you have a library that uses %hardfork(pre-4750) and you want to use it in your %hardfork(post-4750) code. eas will compile both codes as valid, but the final bytecode won't make sense, because for example, you could have bytecodes 0x56 (JUMP) and 0x57 (JUMPI) imported from the library that would no longer be valid. Does this make sense or am I missing something?

Don't get me wrong, I like the idea, I'm just trying to make sure we're both on the same page :smiley:

gzanitti avatar Oct 23 '23 19:10 gzanitti

Yeah, you're absolutely right. I think we actually need two pieces for sound hardfork selection: the target hardfork eas is assembling for, and the range of forks supported by each file.

Ideally, we want eas to error out whenever the actual behaviour doesn't match the expected behaviour. The case I'm particularly concerned about is specifically using a post-4750 library in a pre-4750 contract. Without good hardfork selection, eas will happily compile a post-4750 library (no jumpdest) into a contract that uses jump, and it'll only fail at runtime.

So here's a quick idea:

Invoking eas

$ eas --hardfork shanghai main.etk

main.etk

%hardfork(">=homestead,<=shanghai")
%import("other.etk")
jump bail

other.etk

%hardfork("cancun")

bail:   push1 0
        push1 0
        revert

Syntax to be bikeshedded.

SamWilsn avatar Oct 23 '23 19:10 SamWilsn

Perfect. Now it is much clearer. I will work on a first draft of this idea.

Regarding the syntax, I particularly think we could use something like this to separate macros from directives:

directive hardfork(">=homestead,<=shanghai")

But it's just an idea. More inspiration here

On the other hand, I think the correct approach would be to require this directive to be mandatory at the beginning of the root file passed to the compiler and then only in each include, so as not to break the semantics of imports being interpreted as instructions "as if they were typed here".

gzanitti avatar Oct 23 '23 21:10 gzanitti

But it's just an idea. More inspiration here

Even more inspiration: https://www.nasm.us/xdoc/2.13.03/html/nasmdoc6.html

NASM seems to use square brackets for directives, wrapped in macros for ease of use.

For us, it might looks like:

[HARDFORK >homestead,<=cancun]

Not saying this is my preference, just throwing it out there.

SamWilsn avatar Oct 23 '23 21:10 SamWilsn

I think the correct approach would be to require this directive to be mandatory at the beginning of the root file passed to the compiler and then only in each include, so as not to break the semantics of imports being interpreted as instructions "as if they were typed here".

I think a missing hardfork directive should only be a warning-level message. Conflicting directives should definitely be an error though.

You do bring up another good point. If these directives are file-scoped, that does break the "as if they had been typed here" analogy from the Book.

How about we do something really dumb: every time the hardfork directive is encountered, it's checked against the target hardfork?

So the actual changes to ETK would be:

  1. Accept a command-line option choosing the target hardfork.
  2. Introduce a directive that checks its argument against the target hardfork and errors if they don't match.
  3. Warn if a file has any meaningful contents before the hardfork directive.

I think this even works for instruction macros.

SamWilsn avatar Oct 23 '23 21:10 SamWilsn

I don't even think we need new syntax for this then. %import(...) and %include(...) already error if they can't find the file, so %hardfork(...) fits right in.

SamWilsn avatar Oct 23 '23 21:10 SamWilsn

Hey @SamWilsn, I'm moving forward with the changes we had pending now that the new backend was finally merged. I hope you don't mind having to keep reviewing my code haha.

This modification includes the changes we were discussing.

  • A new %hardfork macro that allows you to define a hardfork or a range of hardforks.
  • A flag that specifies in which version you want to compile the code (default: Cancun for now)

Whenever the range is invalid, or the flag does not match any macro definition, compilation stops.

gzanitti avatar Jan 29 '24 20:01 gzanitti