fortran-src
fortran-src copied to clipboard
Lexing fails for empty lines before a continuation line with DOS line endings
The following fails to lex properly when the file has DOS line endings of \r\n
program main
integer x,
+ z
end program main
Giving the error
ProgramFilefortran-src: fortran-src: user error (parsing-bug.f, 3:1: parsing-bug.f: parsing failed.
Last parsed token: TNewline (2:20)-(2:21).)
But it does parse correctly with unix line endings of \n
.
I believe it's something to do with the logic for continuation lines which should be dealing with this, but I've not been able to pinpoint what needs fixing.
Interesting. What parser/Fortran version are you using for this? I can't replicate this with fortran-src --typecheck
, I get a TNewline
lex error on both versions.
I can see why this might happen, since the continuation handling works via taking X characters (Lexer.FixedForm):
isContinuation :: AlexInput -> Bool
isContinuation ai =
take 6 _next7 == "\n " && not (last _next7 `elem` [' ', '0', '\n', '\r'])
where
_next7 = takeNChars 7 ai
It looks like it works by not consuming the newline. If for \r\n
it doesn't consume just the \r
, then it'll never match with "\n "
. Looking at line 87 it does seem get rid of any rogue \r
s:
<0,st,keyword,iif,assn,doo> \r ;
but I don't know well enough to confirm. Could you attach the example files you're using? Since GitHub swallows at least the newline formatting.
This is on master, and I get that error code with or without --typecheck
and on different Fortran versions (66, 77, 90, 95).
The initial ProgramFile
is a left over from some printf debugging I was trying to do, but otherwise I think you should be getting it.
Attached the file here, seems to have kept line formatting parsing-bug.txt.
And yeah I thought it would be somewhere along those lines. The continuation line logic is separate from the token regex logic, so I don't think line 87 will be used there, but I tried changing the continuation line logic in various places to deal with \r\n
instead of just \n
and I still got the same issue.
Thanks. Can't figure out why I continue to get the same parse error on both Unix and DOS style, I'm building from master 72b6347. You're right, it should handle them both (gfortran does). I'll look into it further.
Wonder if my issue reproducing can't be related to platform. Are you on Mac or Linux?
Huh that is odd, I definitely get it to work with standard unix line endings.
This is on mac, just tried on a linux machine and it does also successfully parse with Unix endings but not DOS, although strangely it gives says lexing failed
at the end of the first line, rathing than parsing failed
as it does on mac.
Are you running windows then? I'm also building from the same commit.
I'm on Linux. It keeps failing on the TNewline
before the continuation! Throwing a test case at the CI reproduces my error: https://github.com/camfort/fortran-src/pull/142/checks?check_run_id=2278287638 but doesn't help much.
By removing the middle empty line, I've changed my error to a parsing error. I noticed that the end of the snippet gets lexed as [TEnd, TId "progra", TId "mmain"]
instead of [TEndProgram, TId "main"]
! I have no clue why. But seems related, since I can get past lexing now.
OK, it looks like all our parsers have varying behaviour on what to do with end program
, which is interfering with the reproducing. I think your team has used Fortran77Legacy
in the past. Using that parser I've reproduced it!
The issue lies in the empty line. Remove it and both parse. My initial thought is:
- with
\n
Unix-style line endings, the line is successfully skipped and doesn't add a newline - with
\r\n
, the continuation is parsed, but something occurs so that aTNewline
is inserted, and this breaks the parsing
I wonder if the line provided in the error was due to the inserted newline not having its position set correctly? Feels like it should have errored on line 3. I'll see if I can print the token list before it errors out to see if this fits what is happening.
Yes sorry should have said, we use legacy 77 for everything so that's what I default to when running fortran-src :)
This isn't hugely problematic btw, I've only encountered one file with this. I thought it should be an easy enough fix, but something odd is going on.
It's a bug (and the other issue related to parser versions probably is too). I swear I skimmed something in the lexer that seemed to attempt to skip adding a newline token in certain situations. Adding more empty lines before the continuation still works on Unix-style indicating to me that it skips them all. Adding support there for skipping Win-style newlines too should fix this
This feels related to #45 (I think fixed in #58). The provided example:
SUBROUTINE lex
IMPLICIT NONE
INTEGER X, Y,
c break
& ZZ
END
works and lexes identical output for DOS and Unix line endings (using -v 77l
). Clearing the comment line (leaving an empty line remaining) breaks lexing for DOS line endings.
Perhaps lexing empty lines before a continuation wasn't directly intended, and working for empty \n
lines was a happy coincidence? @madgen would you have any ideas? Why might Raoul's original example lex for \n
, but not \r\n
?