rgbds icon indicating copy to clipboard operation
rgbds copied to clipboard

[Feature request] Use the bytes of an opcode in numeric expressions

Open Rangi42 opened this issue 4 years ago • 13 comments

LOAD blocks exist for whole chunks of RAM code. However, self-modifying code usually needs more fine-grained control, writing and rewriting individual bytes, not just copying N bytes from ROM to RAM. Here's an example:

; Build a function to write pixels in hAppendVWFText.
; - nothing: or [hl] / ld [hld], a / ld [hl], a / ret
; - invert: xor [hl] / ld [hld], a / ld [hl], a / ret
; - opaque: or [hl] / ld [hld], a / ret
; - invert+opaque: xor [hl] / ld [hld], a / ret
	ld hl, hAppendVWFText
	bit VWF_INVERT_F, b
	ld a, $ae ; xor [hl]
	jr nz, .invert
	ld a, $b6 ; or [hl]
.invert
	ld [hli], a
	ld a, $32 ; ld [hld], a
	ld [hli], a
	bit VWF_OPAQUE_F, b
	jr nz, .opaque
	ld a, $77 ; ld [hl], a
	ld [hli], a
.opaque
	ld [hl], $c9 ; ret

And another:

LD_A_FFXX_OP EQU $f0
JR_C_OP	  EQU $38
JP_C_OP	  EQU $da
LD_B_XX_OP   EQU $06
RET_OP	   EQU $c9
RET_C_OP	 EQU $d8

DEC_C_OP	 EQU $0d
JR_NZ_OP	 EQU $20
LD_A_HLI_OP  EQU $2a
LD_C_XX_OP   EQU $0e
ADD_A_OP	 EQU $87

CopyBitreeCode:
	ld a, DEC_C_OP
	ld [hli], a
	ld a, JR_NZ_OP
	ld [hli], a
	ld a, 3
	ld [hli], a
	ld a, LD_A_HLI_OP
	ld [hli], a
	ld a, LD_C_XX_OP
	ld [hli], a
	ld a, 8
	ld [hli], a
	ld a, ADD_A_OP
	ld [hli], a
	ret

rgbasm can already encode instructions as bytes, so it would be convenient to have a syntax allowing those bytes as part of usual numeric expressions. A few ideas:

  • <[ nop ]> (inspired by HTML CDATA)
  • 'nop' (not currently in use)
  • OPCODE(nop) (like HIGH(bc) or DEF(Symbol))

(I think OPCODE would be clearest and fit in best with existing rgbasm syntax; too much meaningful punctuation ends up like Perl.)

Some tricky details:

  • How to handle multi-byte operations. rl h acts like db $CB, $14; this could be pretty easily handled with HIGH and LOW, like ld a, LOW(OPCODE(rl h)) / ld [hli], a / ld [hl], HIGH(OPCODE(rl h)). (Little-endian order to be consistent, so dw OPCODE(rl h) would act like plain rl h.) Three-byte ones would still be feasible, if less convenient: do def op = OPCODE(ld hl, $abcd), then work with LOW(op) ($21), HIGH(op) ($cd), and op>>16 ($ab).
  • How to handle relative jumps. Evaluating the relative position of a label would be confusing and, I expect, not useful. People are more likely to want absolute jump distances, which currently get expressed with @: e.g. jumping ahead 5 bytes without a label is done as jr (@ + 2) + 5. Here, OPCODE(jr 5) could evaluate to $0518, or OPCODE(jr z, $ff) to $ff28 (aka the notional "rst z, $38"). Or just disallow the destination here, so OPCODE(jr) == $18 and OPCODE(jr z) == $28.

However this is done, it should reuse the parser's regular opcode handling, without needing two separate paths. I don't expect that to be a problem.

Rangi42 avatar Apr 02 '21 20:04 Rangi42

This was discussed previously in Discord #rgbds: https://discord.com/channels/303217943234215948/661193788802203688/803226228655521824

Rangi42 avatar Apr 02 '21 20:04 Rangi42

Here's a tentative implementation:

  • (Rename cpu_command to cpu_instr, grrr)
  • Have cpu_instr return the bytes/patches to be written instead of directly outputting them to the object file
  • Have plain_directive handle outputting those byte and the patches
  • Add T_OPCODE '(' cpu_command ')' to relocexpr, which produces a 32-bit value from the (little-endian) bytes returned by that cpu_command
  • Allow converting relocexprs to consts if they don't contain any patches

The main problem is how to handle the patches, though.

  • Computing a RPN expression that handles the patches correctly: what about jr?
  • Some sort of hybrid storage: a lot more complexity

I think that this would require the lazy expression evaluator (#663) anyway, though we may want to design it with this in mind..?

ISSOtm avatar Apr 19 '21 09:04 ISSOtm

If we go with a T_OPCODE '(' cpu_command ')' implementation, I'd like to also allow T_OPCODE '(' T_Z80_JR ')', T_OPCODE '(' T_Z80_JP ccode ')', T_OPCODE '(' T_Z80_CALL ccode ')', etc for encoding the first byte without a target word.

Rangi42 avatar May 30 '21 17:05 Rangi42

If we go with a T_OPCODE '(' cpu_command ')' implementation, I'd like to also allow T_OPCODE '(' T_Z80_JR ')', T_OPCODE '(' T_Z80_JP ccode ')', T_OPCODE '(' T_Z80_CALL ccode ')', etc for encoding the first byte without a target word.

That would be trivial with a dummy argument (OPCODE(jr nz, @)).

aaaaaa123456789 avatar May 30 '21 17:05 aaaaaa123456789

I think OPCODE(jr nz) would be more readable than LOW(OPCODE(jr nz, @)), and worth the extra parser implementation.

Rangi42 avatar May 30 '21 17:05 Rangi42

Doubt it, it's not very DRY, especially if some syntax regarding expressions is changed. (Shortcuts, or whatever.) I prefer the dummy argument solution.

ISSOtm avatar May 30 '21 17:05 ISSOtm

Examples using OPCODE as proposed above:

	ld hl, hAppendVWFText
	bit VWF_INVERT_F, b
	ld a, OPCODE(xor [hl])
	jr nz, .invert
	ld a, OPCODE(or [hl])
.invert
	ld [hli], a
	ld a, OPCODE(ld [hld], a)
	ld [hli], a
	bit VWF_OPAQUE_F, b
	jr nz, .opaque
	ld a, OPCODE(ld [hl], a)
	ld [hli], a
.opaque
	ld [hl], OPCODE(ret)
CopyBitreeCode:
	ld a, OPCODE(dec c)
	ld [hli], a
	ld a, LOW(OPCODE(jr nz, @))
	ld [hli], a
	ld a, 3 ; skips to 'add a'
	ld [hli], a
	ld a, OPCODE(ld a, [hli])
	ld [hli], a
	ld a, LOW(OPCODE(ld c, 8))
	ld [hli], a
	ld a, HIGH(OPCODE(ld c, 8))
	ld [hli], a
	ld a, OPCODE(add a)
	ld [hli], a
	ret

Rangi42 avatar Jul 18 '22 17:07 Rangi42