rsgbds
rsgbds copied to clipboard
Suggest corrections when invalid code is provided
- [ ] When truncating an expression to 8-bit, suggest using
LOW() - [ ] When trying to
ldtwo 16-bit regs, suggest using two 8-bitlds - [ ] When trying to ALU over a reg that isn't
a..? - [ ] When trying to
ldhwith an address in 8-bit range, suggest adding $FF00
Some suggestions are more complex. Perhaps they would warrant linking to a curated "snippets" page?
The only common mistake I can think of that you didn't list is attempting to use [de] or [bc] the way you'd use [hl]. I'm sure I'll think of more, though...
Given the current RGBDS parser design, we would need to refactor the rules and add more of them to support these sort of errors. I feel like there's tension between that and removing redundant rules (e.g. support for ld [c], a, jp [hl], ldi a, hl, etc). Even the checklist here lists "When trying to ldh with an address in 8-bit range, suggest adding $FF00", which was a feature we explicitly deprecated and will remove in current RGBDS.
If we extended the parser to validly parse invalid code like add b, c, inc [bc], etc, I would expect it to also give corrections/helpful errors for code that was once valid but now is not. But then if the parser is dedicating lines to them anyway, it increases the argument for just supporting them as valid, since it would arguably be simpler to just output the user's intended bytes rather than print an explanation of what to do instead.
Maybe the Rust parser will avoid that tension by having an architecture that makes it easier/simpler/shorter to just handle these cases individually instead of making them catch-all syntax errors?
(To be specific: given the above checklist, I would also want to add a correction message for jp [hl] to suggest jp hl instead, a correction for ld [c], a to suggest ldh [c], a instead, etc. But then, instead of a correction, just doing [the Rust equivalent of] sect_ConstByte(0xE9);, etc would be shorter and arguably more helpful for the user in a "do what I mean" sense.)
There are at least three categories of invalid code:
- Things that would syntactically make sense, but are not supported (
add b, c,ld de, hl) - Syntax bikeshedding (
jp [hl],ldh [$42], a) - Things supported elsewhere (
ex de, hl)
Strictly speaking, we have no obligation to provide anything beyond “syntax error” for any of these categories. I think, however, that providing more detailed error messages, especially with a help: did you mean to... addendum, is uncontroversial.
Thus, we will want some infrastructure for reporting such suggestions anyway. That means that the cost of dealing with the latter two categories becomes lower (though, again for sure, with a lower priority), and may feel more consistent.
The argument for keeping syntax bikeshedding as errors is the same as ever: consistency in what's accepted. jp [hl] is not ambiguous one bit grammar-wise, but that changes nothing to the fact that it doesn't make sense. Yes, that's being opinionated. Deal with it, accepting or rejecting specific syntax a choice that can't be escaped. I would further contend that someone who feels bothered by “but it knows what I mean!” would also be bothered by “syntax error” without any further guidance; but at least with the former, they can quickly move on.
As for “things supported elsewhere”, that's even more niche. It is, like the second category, intended to help new developers, who may be lacking 1. experience, 2. knowledge, 3. confidence to ask, about the error. I think we should weigh such changes on that basis.
I think trimming the grammar tree as much as possible is not what we need, since the instruction syntax is overall very stable, and does not conflict with the rest of the grammar. I think focusing on consistency of the overall syntax is a more useful aspect; sure, that would be more lines of code, but it's a part we would hardly ever touch, except perhaps when refactoring the parser... but that's a huge effort either way, I'm not sure this would change much.
Lastly, once the initial infrastructure has been set up for the first category, using it for the other two should be a low-enough-effort task that it can be done when one of us feels a little idle, or for some new contributor. I think we can defer that bikeshedding to when crossing that bridge, which includes after building it. ;)
Thanks, that makes sense.
Re: "Things that would syntactically make sense, but are not supported", there are a lot of register mistakes that could qualify. Like jp de, or push sp, or ld [wAddr], hl (since ld [wAddr], sp works...), or push a...
Maybe we could just parse any instruction keyword followed by comma-separated arguments (whether they're registers, condition codes, [dereferences], or expressions), and then handle instructions case-by-case. (As opposed to the current parser.y design where every instruction gets its own validity-limited parse rule.)
Hmm, sounds interesting. But what about ld hl, sp + 2?
And, beyond that, what would we do with that parse? If what you have in mind is to report a more precise “invalid/unknown instruction” error instead of “syntax error”, then I think that's a nice addition too, but doesn't require special handling (at least under the model I have in mind).
Yeah, I was expecting ld hl, sp + 2 would be a special case handled separately, but am not sure if that would be a shift/reduce conflict.
I don't understand what you mean by "that's a nice addition too, but doesn't require special handling (at least under the model I have in mind)". Yes, my goal would be "make anything instruction-y a valid parse so we can examine it and provide arbitrary warning/error messages. Maybe that implementation would be overkill compared to "just add some more parse rules", but I wouldn't want to keep trying to guess more ways to write plausible but invalid instructions. (add a, b, c? dec nz, b?)
I think we have the same (general) idea in mind, yes.