rgbds [Feature request] Detect the type of a given piece of code (basic reflection or pattern matching)

(I'm still exploring what this feature would involve; maybe it's not practical. But the problems from lacking something like this are real, even if there's a better solution than this.)

A problem that repeatedly comes up when writing macros is: how to tell what type an argument is?

Maybe you want to take a number or a string, and act differently based on which one it is. Maybe you want to take an 8- or 16-bit register. Maybe you want to take a label or a numeric literal.

Workarounds common in current asm macros are basically "string inspection". Want to know if it's a number or string? Check STRFIND("\1", "\"") == 0... but that only works if \1 was a literal quoted string, not e.g. STRCAT("hel", "lo"). Want to know if it's an 8-bit register? Check if STRLWR("\1") is "a", "b", "c", etc... but that fails for LOW(bc). Want to know if it's a condition code? Check against "c", "nc", "z", "nz"... but that fails for !!!z, or for cy if you did DEF cy EQUS "c".

Basically I'd like a DATATYPE() function that takes pretty much any argument and returns something indicating its type. EQUS expansion would occur inside the argument, which solves problems like DEF cy EQUS "c", or string-detection issues. (I'm writing up this request because I ran into a case where #name was not an appropriate substitute for "{name}" -- when a macro is inspecting it for a leading ".)

I think DATATYPE returning a string would be clear to the user and easy for us to extend in future, like JS typeof. As for what types it should recognize:

"directive": SECTION, assert, MACRO, etc
"instruction": jp, call, Halt, etc
"rl": rl... it can be a directive or an instruction
"reg8": A, LOW(de), etc
"reg16": bc, DE, hl, sp, Af
"condition": z, !nc, !!!z, etc
"c": c... it can be a register or a condition
- Note that this would only apply to literal c (or EQUS expanding out to c). LOW(bc) is not usable as a condition code in call/jp/jr.
"number": 42, $beef, %1001, BANK(x), etc
"string": "hello", """world""", STRCAT("a", "b"), etc
"identifier": Foo, bar, ., etc
"local": .foo, Global.local, .., etc
"other": Anything not matching a well-defined pattern (*, SECTION UNION, z + 2, etc)
- I'm not sure whether too-wild things should be supported and detected as "other", or just be syntax errors. The actual real-world use cases don't need such flexibility, so we'd be fine making them syntax errors and maybe in future allowing broader input.

Jun 21 '25 16:06 Rangi42

I've been wanting this for... well, since before the pandemic. It goes along with OPCODE.

Jun 21 '25 17:06 aaaaaa123456789

It's possible that we'd run into fundamental edge-case issues by actually implementing it, but I expect the parser.y could just do something like this:

| OP_DATATYPE LPAREN number RPAREN { $$ = "number"; }
| OP_DATATYPE LPAREN string_literal RPAREN { $$ = "string"; }
| OP_DATATYPE LPAREN ccode RPAREN { $$ = "condition"; }
etc

That said, c and rl feel like real warts to this feature.

Jun 21 '25 17:06 Rangi42

A companion to this I'm imagining is something like DATAEQUAL(x, y), which returns 0 or 1 if x and y are "equal", in a broader sense than == or strcmp. For example LOW(bc) is equal to c; cy is equal to c if you have DEF cy EQUS "c"; 5+5 is equal to $A; "hello" is equal to STRLWR(STRCAT("HE","LLO")); macro is equal to MACRO; Foo is equal to Foo; and so on.

Jun 21 '25 17:06 Rangi42

NASM has something like this: %ifid, %ifnum, and %ifstr directives for testing token types. (It generally uses more directives where we'd use functions, e.g. %ifdef x instead of if DEF(x), %ifidn x, y instead of if !STRCMP(x, y), %ifn x instead of if !x, etc. We probably don't want to define separate IS*() functions for every kind of token, so I'd prefer a solution like the one outlined here.)

Jun 22 '25 05:06 Rangi42

I don't like this, because it solidifies a lot of internal details, it has a high maintenance cost, and it's too general.

This would be one huge grammar rule in our syntax file, which would have to be kept in sync with changes we make to other grammar rules—keep in mind that we have some syntax conflicts, such as strings clashing with expressions, and they are not encoded in a reusable way.

I don't see any practical use cases that would outweigh the implementation and maintenance costs.

Jun 22 '25 08:06 ISSOtm

@ISSOtm What about something more like NASM's? "Numbers", "strings", and "identifiers" -- or even just "numbers", "strings", and "neither" -- are pretty common to want to distinguish. And although it's true that strings can be used in numeric contexts, there's never any ambiguity or internal-detail-dependence on "is this expression a number or string".

Jun 22 '25 13:06 Rangi42

The downsides you list are real, but I'll probably see if this can be implemented while avoiding them. (For instance, I don't think its parser rule will have to be very long if it doesn't have keyword/instruction detection -- which is a much less common need than number/string -- and it should definitely expose fewer internal details particular to RGBASM than ISCONST did.)

Jun 28 '25 02:06 Rangi42

Postponing this until 2.0. ISNUM/ISSTR, as ISSOtm and ax6 have pointed out, could reasonably distinguish even non-value input like registers, instructions, keywords, etc. (As opposed to ISCONST, which more clearly pertains only to values.) I'd rather wait until we have lazy evaluation, and see how this can work together with OPCODE (#823).

Jul 09 '25 16:07 Rangi42

WLA-DX has a solution for this as \?N, which evaluates to the type of macro arg \N. (It has values to check for like ARG_NUMBER, ARG_STRING, ARG_LABEL, etc.)

Jul 15 '25 02:07 Rangi42