customasm icon indicating copy to clipboard operation
customasm copied to clipboard

Disambiguate instructions with whitespace differences

Open wgaylord opened this issue 3 years ago • 9 comments

An example can be seen when you try and assemble the following.

#subruledef register
{
	J => 0xB
	IJ => 0xA
}

#ruledef
{

    ld {r: register} ,{addr: u16} => 0x01 @ r`8 @ addr`16
    ldi {r: register}, {value:u8} => 0x02 @ r`8 @ value`8 @ 0x00
 
}
ldi J,0

This appears to be interpreting the instruction as either "ldi J,0" OR "ld IJ,0" causing the message about multiple matches.

wgaylord avatar Oct 22 '21 19:10 wgaylord

It looks like customasm is ignoring whitespace, for example:

#ruledef
{
	rl a => 0x0000
	rla => 0x01
}

rl a
rla

produces 01 01, not 00 00 01. I've been trying to implement the z80 instruction set in customasm, and have run into a few instances like above.

agrif avatar Jan 05 '22 00:01 agrif

Yeah, it's ignoring whitespace. It was the easier route to take since the assembler uses a weird mix of lexical tokens and raw characters for instruction matching. I've gotta find a way to make whitespace count again without breaking backward-compatibility!

hlorenzi avatar Jan 05 '22 00:01 hlorenzi

I've been poking at the source tonight, and I'm not sure this is fixable without breaking at least this test.

It seems reasonable to me that the rule should only match r0000 and friends, without the space, but I'm not sure why the behavior is there in the first place.

agrif avatar Jan 05 '22 02:01 agrif

It was specifically for "glued" rules that the token/character duality arose in the parser. That test reflects a use-case where you may need to supply a complex expression to a glued argument -- the space is needed for the expression parser to work.

How did you go about solving the issue in general? I think we can add a small exception for that test, so all would be good!

hlorenzi avatar Jan 05 '22 03:01 hlorenzi

I haven't implemented it yet, but I was planning on adding a whitespace variant to the rule parts. Looking at this more closely, I'm not sure that would work. I might try it anyway, and see what damage it causes.

agrif avatar Jan 05 '22 03:01 agrif

I've played a bit with whitespaces rules, and it seems it'll be difficult to add it without breaking something in the way! It seems to break asm blocks, unfortunately, which might be a tough fix.

I've also tried simplifying this whole token/character duality thing to just using tokens directly, but that breaks a few other tests, and I'm not sure whether anyone is relying on that behavior. But this seems to be the most promising path, I think.

(This is all on the rule_whitespace branch!)

hlorenzi avatar Jan 07 '22 03:01 hlorenzi

I've done some exploring as well, attaching an after_ignorable flag to all tokens and using this later to introduce a Space pattern to rules. Originally this flag was in the parser, but I also ran into an issue with asm blocks!

This passes all tests, but has the unusual side effect that rules written as op {a}, {b} now require the space after the comma -- basically, any space present in the rule must be present to match, but extra spaces are ignored. Unfortunately this means it doesn't solve my rl a / rla problem, and I don't know any way around that.

It's hard to track what spaces are important across parameter boundaries! The spaces between symbols are probably always important, but how can you tell if the space in the rule op {a} is between symbols?

agrif avatar Jan 07 '22 05:01 agrif

I found out why asm blocks were erroring out, and now I can pass all the tests! (But I had to edit some of the more contrived tests.) It's on the rule_whitespace branch again.

As you said, rules written as op {a}, {b} will require the space after the comma. We can bet someone is relying on the old behavior, so this would be a breaking change... I wonder if that would be acceptable?

hlorenzi avatar Jan 14 '22 21:01 hlorenzi

I think so, but I'm obviously pretty biased here. It's unfortunate that these two tests broke, though. Back-compatibility would be a little bit easier to bear if a rule like op {a},{b} also accepted things like op a, b.

agrif avatar Jan 15 '22 00:01 agrif

I was finally able to get all the tests to pass under the new whitespace rules, and it's now available on v0.12! It's a breaking change and there might be unexpected issues, so let's see if it's worth it.

hlorenzi avatar Feb 20 '23 01:02 hlorenzi