ack
ack copied to clipboard
Can't use dot in PowerPC assembler
I'm running 559233e. I can't assemble this file for PowerPC:
.sect .text
.data4 sz
.sect .rom
be:
.asciz "A string"
sz = . - be
The command ack -mlinuxppc -c.o file.s fails with "file.s", line 7: expr syntax err. The commands ack -mlinux386 -c.o file.s and ack -mlinux68k -c.o file.s both succeed.
The problem is with the dot . in the expression . - be. The assembler defines "." as a DOT token in mach/proto/as/comm3.c, but the PowerPC assembler defines "." as a C token in mach/powerpc/as/mach3.c, so "." appears twice in the keytab. When the assembler searches the keytab, it seems to always find the C "." and never the DOT ".".
C and DOT need to be different tokens, or the yacc grammar would be ambiguous. To fix this problem, one might need to hack the lexer to decide whether each "." is a C or a DOT.
Hmm. That's nasty. A lexer hack would be evil --- you'd need some way to tell it whether the parser was trying to read an instruction or not, and I'm not sure if there's a reliable way to do that.
It might be more plausible to add a configuration option to allow DOT to become some other symbol on the PowerPC, such as $ (which I think I've seen in other assemblers). The ACK assembler is pretty non-standard anyway, and it's probably more important to use standard mnemonics.
A more expressive grammar which knows the difference between identifiers and mnemonics would be nice, too.
I can work around the problem by using a label:
.sect .text
.data4 sz
.sect .rom
be:
.asciz "A string"
1:
sz = 1b - be
A lexer hack would be evil --- you'd need some way to tell it whether the parser was trying to read an instruction or not, and I'm not sure if there's a reliable way to do that.
In Matz's Ruby, the lexer and parser have a variable lexer_state, telling the lexer how to read the next token. Given the Ruby code obj.class, the lexer must return an identifier class, not the keyword class. The lexer state for this seems to be EXPR_DOT.
With this kind of trick, one might teach our assembler that add is a label in bl add, or that b is a label in b: nop, or that beq+ is an instruction name in beq+ label.
(I never added beq+ to our assembler, partly because it can't lex it, but also because I don't want static branch hints in my programs, and because there are two different ways to encode static branch hints.)
A simpler trick might allow dot . in expressions. We now lex .or as one token, but or. as two tokens. A keyword or identifier may begin with a dot, but may not end with one. The first hack would allow a dot at the end of any identifier. So or. would be one token. The second hack would split or. from one token to two tokens, to fit the current grammar. But the simpler trick would never work with beq+. We can't allow a plus at the end of any identifier, because label+4 would stop working.