Fix bytecode printing mistaking the data section for actual instructions
Consider
fn main() -> b256 {
0x3333333333333333333333333333333333333333333333333333333333333333
}
The generated bytecode, accroding to forc parse-bytecode is
half-word byte op raw notes
0 0 JI(4) 90 00 00 04 jump to byte 16
1 4 NOOP 47 00 00 00
2 8 Undefined 00 00 00 00 data section offset lo (0)
3 12 Undefined 00 00 00 28 data section offset hi (40)
4 16 LW(63, 12, 1) 5d fc c0 01
5 20 ADD(63, 63, 12) 10 ff f3 00
6 24 LW(17, 63, 5) 5d 47 f0 05
7 28 ADD(17, 17, 12) 10 45 13 00
8 32 LW(16, 63, 4) 5d 43 f0 04
9 36 RETD(17, 16) 25 45 00 00
10 40 LOG(12, 51, 12, 51) 33 33 33 33
11 44 LOG(12, 51, 12, 51) 33 33 33 33
12 48 LOG(12, 51, 12, 51) 33 33 33 33
13 52 LOG(12, 51, 12, 51) 33 33 33 33
14 56 LOG(12, 51, 12, 51) 33 33 33 33
15 60 LOG(12, 51, 12, 51) 33 33 33 33
16 64 LOG(12, 51, 12, 51) 33 33 33 33
17 68 LOG(12, 51, 12, 51) 33 33 33 33
18 72 Undefined 00 00 00 00
19 76 Undefined 00 00 00 20
20 80 Undefined 00 00 00 00
21 84 Undefined 00 00 00 28
Note that we're mistaking the data section (the 33s starting byte 40) as actual instructions (LOG), which is misleading.
To know where the data section is, is non-trivial. The best we can do is assume that the word at byte offset 8 is the offset to it, which is what the disassembly does in comments above. But that is entirely by convention and could change in the future. forc-dis actually tracks what the value of $ds is to know where the data section is, which is itself slightly by convention, since $ds is reserved by the compiler and not the VM.
To get around the blown out data section problem in #3161 I was considering putting some values in a local data section, per function, rather than having a single monolithic data section. We're also considering putting const-ish data at the start of the binary for load-type constants. These are more potential examples of where the simple output from parse-bytecode would get confused.
@otrho why don't we port forc-dis and make it an official plugin in the Sway repository, and have it replace forc parse-bytecode? I mean if it's much better and more accurate, might as well deprecate the old approach and not have to worry about fixing this. forc-dis is probably a better name too 🤷
Sure. We need to add finish it off, there's still some todos in there and it needs tests, or at least checks to make sure it's up to date. Occasionally we still update fuel-asm (though I guess not so much after beta4) and forc-dis can get out of date.
Moving this to the Sway project so that we can actually give it some attention.