language-c icon indicating copy to clipboard operation
language-c copied to clipboard

GCC preprocessor output generated in non-ASCII locales cannot be processed

Open arsdragonfly opened this issue 5 years ago • 2 comments

see this issue

arsdragonfly avatar May 05 '20 22:05 arsdragonfly

Hopefully this should just be a simple change in the lexer. PR's welcome!

expipiplus1 avatar Sep 07 '20 04:09 expipiplus1

So, I looked into this and I think I found the fix, but Alex might need to release a bug fix first.

I saved the sample from the linked issue as a UTF-8 file:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<命令行>"
# 31 "<命令行>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<命令行>" 2
# 1 "test.c"
int main()
{
 return 0;
}

And sure enough got Prelude.head: empty list. The error comes from the second usage of head at this location, and is caused by the first non-ASCII line # 1 "<命令行>".

Basically the problem is that Alex is assuming the input bytestring is UTF-8, but the InputStream is a byte-by-byte abstraction (effectively Latin-1). In these lines:

\#$space*@digits$space*(\"($infname|@charesc)*\"$space*)?(@int$space*)*\r?$eol
  { \pos len str -> setPos (adjustLineDirective len (takeChars len str) pos) >> lexToken' False }

Alex is passing 12 for len, which is the correct Unicode codepoint length of # 1 "<命令行>" plus a newline at the end. But takeChars then takes 12 bytes off the bytestring, so adjustLineDirective receives a broken string which does not include the double quote at the end.

The correct fix is to put Alex back into Latin-1 mode (my impression is that this was the default previously, but was then switched in Alex 3.0). This is done with the %encoding "latin1" directive (added in Alex 3.1.7). However, it still doesn't work because there was a remaining bug in character counting that caused it to still pass the too-short length. This was fixed in https://github.com/simonmar/alex/pull/156 but even though that was merged a year ago it appears to not have made it into the recent Alex 3.2.6. So, I'll ping that to see when it can be released.

mtolly avatar Jan 10 '21 03:01 mtolly