fortran-src Lexing fails for empty lines before a continuation line with DOS line endings

The following fails to lex properly when the file has DOS line endings of \r\n

      program main
      integer x,

    + z
      end program main

Giving the error

ProgramFilefortran-src: fortran-src: user error (parsing-bug.f, 3:1: parsing-bug.f: parsing failed. 
Last parsed token: TNewline (2:20)-(2:21).)

But it does parse correctly with unix line endings of \n.

I believe it's something to do with the logic for continuation lines which should be dealing with this, but I've not been able to pinpoint what needs fixing.

Apr 01 '21 17:04 RaoulHC

Interesting. What parser/Fortran version are you using for this? I can't replicate this with fortran-src --typecheck, I get a TNewline lex error on both versions.

I can see why this might happen, since the continuation handling works via taking X characters (Lexer.FixedForm):

isContinuation :: AlexInput -> Bool
isContinuation ai =
  take 6 _next7 == "\n     " && not (last _next7 `elem` [' ', '0', '\n', '\r'])
  where
    _next7 = takeNChars 7 ai

It looks like it works by not consuming the newline. If for \r\n it doesn't consume just the \r, then it'll never match with "\n ". Looking at line 87 it does seem get rid of any rogue \rs:

  <0,st,keyword,iif,assn,doo> \r              ;

but I don't know well enough to confirm. Could you attach the example files you're using? Since GitHub swallows at least the newline formatting.

Apr 02 '21 12:04 raehik

This is on master, and I get that error code with or without --typecheck and on different Fortran versions (66, 77, 90, 95). The initial ProgramFile is a left over from some printf debugging I was trying to do, but otherwise I think you should be getting it.

Attached the file here, seems to have kept line formatting parsing-bug.txt.

And yeah I thought it would be somewhere along those lines. The continuation line logic is separate from the token regex logic, so I don't think line 87 will be used there, but I tried changing the continuation line logic in various places to deal with \r\n instead of just \n and I still got the same issue.

Apr 06 '21 09:04 RaoulHC

Thanks. Can't figure out why I continue to get the same parse error on both Unix and DOS style, I'm building from master 72b6347. You're right, it should handle them both (gfortran does). I'll look into it further.

Wonder if my issue reproducing can't be related to platform. Are you on Mac or Linux?

Apr 06 '21 12:04 raehik

Huh that is odd, I definitely get it to work with standard unix line endings.

This is on mac, just tried on a linux machine and it does also successfully parse with Unix endings but not DOS, although strangely it gives says lexing failed at the end of the first line, rathing than parsing failed as it does on mac.

Are you running windows then? I'm also building from the same commit.

Apr 06 '21 12:04 RaoulHC

I'm on Linux. It keeps failing on the TNewline before the continuation! Throwing a test case at the CI reproduces my error: https://github.com/camfort/fortran-src/pull/142/checks?check_run_id=2278287638 but doesn't help much.

By removing the middle empty line, I've changed my error to a parsing error. I noticed that the end of the snippet gets lexed as [TEnd, TId "progra", TId "mmain"] instead of [TEndProgram, TId "main"]! I have no clue why. But seems related, since I can get past lexing now.

Apr 06 '21 13:04 raehik

OK, it looks like all our parsers have varying behaviour on what to do with end program, which is interfering with the reproducing. I think your team has used Fortran77Legacy in the past. Using that parser I've reproduced it!

The issue lies in the empty line. Remove it and both parse. My initial thought is:

with \n Unix-style line endings, the line is successfully skipped and doesn't add a newline
with \r\n, the continuation is parsed, but something occurs so that a TNewline is inserted, and this breaks the parsing

I wonder if the line provided in the error was due to the inserted newline not having its position set correctly? Feels like it should have errored on line 3. I'll see if I can print the token list before it errors out to see if this fits what is happening.

Apr 06 '21 14:04 raehik

Yes sorry should have said, we use legacy 77 for everything so that's what I default to when running fortran-src :)

This isn't hugely problematic btw, I've only encountered one file with this. I thought it should be an easy enough fix, but something odd is going on.

Apr 06 '21 14:04 RaoulHC

It's a bug (and the other issue related to parser versions probably is too). I swear I skimmed something in the lexer that seemed to attempt to skip adding a newline token in certain situations. Adding more empty lines before the continuation still works on Unix-style indicating to me that it skips them all. Adding support there for skipping Win-style newlines too should fix this

Apr 06 '21 14:04 raehik

This feels related to #45 (I think fixed in #58). The provided example:

      SUBROUTINE lex
      IMPLICIT NONE
      INTEGER X, Y,
c break
     &   ZZ
      
      END

works and lexes identical output for DOS and Unix line endings (using -v 77l). Clearing the comment line (leaving an empty line remaining) breaks lexing for DOS line endings.

Perhaps lexing empty lines before a continuation wasn't directly intended, and working for empty \n lines was a happy coincidence? @madgen would you have any ideas? Why might Raoul's original example lex for \n, but not \r\n?

Apr 12 '21 16:04 raehik

fortran-src fortran-src copied to clipboard

Lexing fails for empty lines before a continuation line with DOS line endings

fortran-src
fortran-src copied to clipboard