Q: data tables
Is there an equivalent to huff's data tables in geas?
You have this CODECOPY example https://github.com/fjl/geas?tab=readme-ov-file#assemble, but is there any way to put a raw byte sequence between labels? Something like:
.data_start:
0x12345678 ;; raw data for use with CODECOPY
.data_end:
This is basically issue #7. I have some ideas for syntax to that end. I'm curious what you want to use it for. Is it just for strings, do you want to include external files?
I primarily need this to build larger calldata, where some words are static. e.g. balanceOf(constant address). Some additional nice properties would be to be in control of the placement of the raw bytecode.
A possible design I thought about would be a function .raw that can optionally be named.
;; just raw data:
.raw(0x12345678)
;; or labeled raw data:
.my_data: .raw(0x12345678)
And than the raw bytes could be accessed using:
push @my_data ;; size
push .size(@my_data) ;; offset size
push 0 ;; destOffset offset size
codecopy
The .size function could be avoided with another .data_end label, but that would not be very ergonomic.
Hmm. That doesn't really work because the design of expression macros has a few constraints.
- macro calls can only be used where an expression is expected: as a PUSH argument or in the #define of another macro
- result values of macros are bigints, so it's a bit weird to use them for binary data
I think we should go ahead and implement my proposal from #7, with a directive to include raw bytes and the literal syntax:
#bytecode {
0x01020304 ;; hex bytes supported using number literal
"string" ;; can use string literals for text
4: myMacro() ;; numeric label defines byte size of following expression
}
This syntax can also be used for jump tables:
.jt: #bytecode {
2: @targetOne
2: @targetTwo
2: @targetThree
}
;; here we use 'value' from the stack to determine the jump location
;; by offsetting into the table
dup1 ; [value, value]
push 2 ; [2, value, value]
lt ; [2<value, value]
jumpi @outOfBounds ; [value]
push @.jt ; [offset, value]
add ; [offset]
push 2 ; [size, offset]
swap1 ; [offset, size]
push 0 ; [dstOffset, offset, size]
codecopy ; []
push 0 ; [offset]
mload ; [word]
push 0xffff<<(256-16) ; [mask, word]
and ; [label]
jump ; []
Using string literals in ''#bytecode" would be nice. Although it might be a bit complicated to get the logic right. e.g. should they be abi encoded? Maybe another way would be to put this functionality in a builtin-function like .selector. This would give some more flexibility as their could be one function to abi-encode and another to abi-encode-packed.
What is a usecase for macors in #bytecode?
jumptables are gas inefficient. I wouldn't bother.
Built-in ABI encoding is not a goal for me personally. The bytes syntax should just be for writing out raw bytes. We can add a special macro to encode basic types maybe. But I don't want to create an encoder for arbitrary data types. And there are no arrays/lists/structs in the geas typing model anyway, so you wouldn't be able to express complex objects.
I have implemented the idea from https://github.com/fjl/geas/issues/13#issuecomment-2587384644 in #16. Would be nice to get some feedback about the macro. I called it bytesSize and it specifically checks that the instruction being accessed is #bytes.
Still thinking about alternatives. For some reason, the approach in #16 doesn't feel 100% right for me.
One alternative could be allowing an optional label inside of #bytes. This would define the name as a label, but also make the value of the bytes available as a macro, like a dual definition.
push .len(name)
push @name
push 0
codecopy
#bytes name: 0x010203
This somehow feels more aligned with the language, but it has some other problems. It might be confusing since it's pretty easy to write this out of order as:
name: #bytes 0x010203
and the error reported by the assembler will be that macro name is undefined. The compiler could have special logic to detect this and give a more meaningful error.
Could also be
#bytes name = 0x010203
as a shorthand for
#define name = 0x010203
.name: #bytes name
I've added the #bytes name: 0x010203 syntax in commit 7c9be99a003bb06cf2d75e9e4432be74da2944f2.