hask icon indicating copy to clipboard operation
hask copied to clipboard

Implement source parser

Open TheMatten opened this issue 5 years ago • 7 comments

Once #4 is finished, we want to parse it - let's track related problems here:

  • Moving relevant question from #1 here - which library do we want to use for parsing? megaparsec and alex+happy sound like a good fit for our purposes - personally I may be more inclined to the former, though both options have their pros and cons.
  • What about error reporting? I guess we want to be consistent across the compiler, so we'll probably want custom error printer instead of e.g. one in megaparsec, which is specifically tied to ParsecT state. This may actually be question worth a separate issue - I'm interested in style similar to rustc:
    error[E0391]: cycle detected when computing the supertraits of `main::FirstTrait`
     --> src/main.rs:3:20
      |
    3 | trait FirstTrait : SecondTrait {
      |                    ^^^^^^^^^^^
      |
    note: ...which requires computing the supertraits of `main::SecondTrait`...
     --> src/main.rs:7:21
      |
    7 | trait SecondTrait : FirstTrait {
      |                     ^^^^^^^^^^
      = note: ...which again requires computing the supertraits of `main::FirstTrait`, completing the cycle
    note: cycle used when collecting item types in top-level module
     --> src/main.rs:3:1
      |
    3 | trait FirstTrait : SecondTrait {
      | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    error: aborting due to previous error
    
    Related to this, we want to somehow print the relevant source in later phases after parsing. Instead of pretty printing AST back, I would propose storing parsed lines of source as they are and storing line indexes along the position in line in spans in AST - we would then print the line itself (maybe truncated if needed), underlined in place of actual error span.

TheMatten avatar Oct 27 '20 09:10 TheMatten

I'm quite keen generally on separating lexing from parsing and so wondered if we could do alex + megaparsec. The main difficulty is I'm not sure how you lex Haskell style layout in a pleasing fashion (mostly because I have never tried it, I don't think there is a fundamental limitation). Maybe looking at how GHC does it would help.

Boarders avatar Oct 27 '20 18:10 Boarders

I'm quite keen generally on separating lexing from parsing and so wondered if we could do alex + megaparsec.

When it comes to lexing, I'm worried about whitespace sensitivity - it doesn't really appear in Haskell98, but does a lot with extensions. It's not a fundamental limitation, but it will make things less nice.

The main difficulty is I'm not sure how you lex Haskell style layout in a pleasing fashion (mostly because I have never tried it, I don't think there is a fundamental limitation)

Looking at GHC's parser to see how they handle indentation vs braces, they seem to manually recognize both cases. https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser.y#L95 sounds horrifying BTW :sweat_smile: Wouldn't we avoid equivalent of shift/reduce conflicts when using combinators? (You just have to replace recursion with fold on list from time to time)

TheMatten avatar Oct 27 '20 20:10 TheMatten

I wonder what to do about operator parsing - would there be any issues with parsing source that uses them as list of tokens and using separate phase to turn those into trees? It sounds more efficient and elegant than reordering left-associative tree as done in GHC.

TheMatten avatar Oct 28 '20 19:10 TheMatten

What is the advantage of doing a separate lexing step if using parser combinators?

solomon-b avatar Oct 29 '20 17:10 solomon-b

Possibly less space for mistakes and no need to handle whitespace in parser I guess - but Haskell is space-sensitive, so I wonder whether it won't be more convenient to do together.

TheMatten avatar Oct 30 '20 07:10 TheMatten

Is there anything I can do to get the ball rolling on the parser?

solomon-b avatar Nov 06 '20 17:11 solomon-b

Feel free to start - I'm slightly busy ATM, so I didn't really start anything myself. BTW, I'm sort of inclined to try multiple approaches (megaparsec/megaparsec+alex/happy+alex), so I guess you can just pick one and we can later sort it out. Nice thing about doing toy project is that there's no reason to hurry :smile:

TheMatten avatar Nov 06 '20 18:11 TheMatten