rgbds
rgbds copied to clipboard
[Feature Request] Literals (inline section fragments)
This is gonna be the weirdest feature request, and I don't mind if you don't think it worth adding. But I've dabbled with some old Mainframe assemblers from back in the 70s, and one of them had an interesting feature they called "Literals".
Basically, you could put code in square brackets, and it would cause that code to be assembled in a block at the end of your code (you could also relocated this block to other places, and I think you could redefine it several times thru your code), and the square brackets would be replaced with the address it ended up getting assembled to.
So you could type things like:
ld hl, [.db "HELLO WORLD"]
call Print
and the string would get assembled to some place, and replaced in the hl with its address. You could also use it to call a short block of code that returns, things like that.
Like I said, while useful, this is not exactly a standard feature, and I don't mind if you don't think it worth adding.
The real issue is, where to put it? "Some place" isn't too descriptive. Perhaps at the end of the current section?
That being said, it seems like an interesting way to do jump tables:
ld hl, .table - 2
.loop
inc hl
inc hl
cp [hl]
inc hl
jr nc, .loop
ld a, [hli]
ld h, [hl]
ld l, a
jp hl
.table
dbw 50, [
ld hl, [db "Try harder!", 0]
jp ShowMessage
]
dbw 120, [
ld hl, [db "Good work.", 0]
call ShowMessage
ld hl, SND_CHIME
jp PlaySound
]
dbw 255, [
ld hl, [db "Excellent work!", 0]
call ShowMessage
ld hl, SND_FANFARE
jp PlaySound
]
It can be an explicitly marked constant pool, similar to ARM assemblers.
The one I saw defaulted to the end of the assembly after your code, but there was a command to relocate it to wherever else you wanted
Note that square brackets collide with memory accesses. In fact, this would mean that parsing is no longer context-free:
ld hl, [niladic_macro_or_a_label_who_knows]
I'd suggest using curly braces instead.
dbw 120, {
ld hl, {db "Good work.", 0}
call ShowMessage
ld hl, SND_CHIME
jp PlaySound
}
Note that this kind of delimiters is used in quite a big family of languages already ;)
Unresolved question: how would label scoping work here?
Unresolved question: how would label scoping work here?
I assume locals before a global would be local to the braces and globals would work as always.
I saw this feature in the rare MIDAS assembler for PDP-10, which you'd think would be hard to get working or find docs for, but actually an easy self installing emulation package and extensive documentation for Midas and its OS are here ITS on Github
Quick comment: parens, brackets and braces cannot work for this. Parens are used for expressions, brackets by memory accesses, and braces by symbol expansions (including outside of strings). So something else would be needed.
Other than that, I'm not really sure about how useful this is; my main concern with this is code mixed in the middle of data, which I'm afraid would hurt readability in cases more trivial than in the OP.
Implementation doesn't sound too complicated, but I'm seeing major pain points, such as specifying the location at which the code should be stored, or the fact that looks a lot like LOAD
, which is already very fragile. From my point of view, this doesn't add anything significant (prove me wrong though) so I'm not sure if it's worth the additional complexity.
After further discussion, I think this might be worth it though in fairly minor ways (compared to current solutions), so I won't close this, but this is low priority as far as I'm concerned. Anyone else willing to take a stab at it is welcome, however.
It can be an explicitly marked constant pool, similar to ARM assemblers.
The proposition here is quite different from ARM assemblers. The literal pools there are typically used when numeric immediates are used that cannot be represented in 8-bits, (combined with some number of rotates as per the spec). It is also used for syntax like LDR R0, =label
where label is an address label, not a storage directive as the OP desires. A literal pool would then be set up which contains the address of label
.
tl;dr I don't think this is a good idea, not least for the combining code/data argument. Also, I can't imagine a use case where such a syntax would be appropriate.
Quick comment: parens, brackets and braces cannot work for this. Parens are used for expressions, brackets by memory accesses, and braces by symbol expansions (including outside of strings). So something else would be needed.
In this case, double brackets would work perfectly. This might be complicated to parse, though, since the lexer would have to know to parse [[
as a single token instead of reading it as two [
symbols. But this is a long-term feature anyway...
If #244 were implemented, allowing two sections to be in the same bank even if they're floating, this could make use of it. The parser could allow reloc_16bit
or reloc_16bit_no_str
to be inline_code
, which would parse as '[[' lines ']]'
. The [[
would push a new anonymous section in the same bank as the current one (also backing up the current nListCountEmpty
and nPCOffset
values); the ]]
would pop the section, restore nListCountEmpty
and nPCOffset
, and evaluate the whole inline code block to that section's address (with some new rpn_AddrSection
function like rpn_BankSection
).
The problem of implementing #244 is that there's no way to adapt the bin packing algorithm to work with the additional constraints.
As you said in Discord, section fragments would actually be fine here. Without this "literal" feature, people would manually put code blocks in the same section anyway, so it's fine if each [[ block ]]
just becomes a FRAGMENT
of its current SECTION
and they all stay contiguous. It will, however, have to wait for #712 to merge, since the "literal" fragments won't have any alignment constraints but the actual section might. (If a [[ literal ]]
is created, it would update the current section to have the fragment modifier.)
Note that square brackets collide with memory accesses.
I don't think this is the case. Building #716 with T_LBRACK
/T_RBRACK
instead of T_RBRACK
/T_2RBRACK
does not produce any shift/reduce or reduce/reduce conflicts, and editing the test cases to use single brackets works as expected.
This is because that PR only allows inline fragments in n16 values, not just any numeric values. So call [fragment]
and jp z, [fragment]
and dw [fragment]
are all valid, and ld a, [hl]
is valid, but there are no instructions like "
ld hl, [n16]`" to cause ambiguity.
Should single brackets be used instead?
Note that single brackets would make these equivalent:
ld a, [[db 42]]
ld a, [.liff]
.liff: db 42
The MIDAS assembler's feature like this was called "constants".
From http://www.bitsavers.org/pdf/mit/rle_pdp1/memos/PDP-1_MIDAS.pdf:
The constant word and surrounding parens are treated as a single syllable whose value is the address of a register contalning the constant word. Constants may be used in constants. The following two program fragments are equivalent:
add (add (20)-lio-(30 ... constants
add a ... a, add b-lio-c b, 20 c, 30
Note that besides the PDP-10's MIDAS assembler, ASMotor itself has had these since 2019:
Speaking of strings, code and data literals can be used to reduce clutter and improve readability. To load the register
a0
with the address of a string, you might dolea { DC.B "This is a string",0 },a0
or to produce the address of a chunk of code
jsr { moveq #0,d0 rts }
(It doesn't allow {interpolation}
outside of strings, so curly braces were available for that.)