thorin2 icon indicating copy to clipboard operation
thorin2 copied to clipboard

Ast

Open leissa opened this issue 10 months ago • 1 comments

This PR adds a proper AST that the parser constructs before emitting actual Thorin code.

While this adds some boilerplate it also reduces a lot of weird hacks/workarounds in the parser as we can now do one thing at a time and we don't have to do everything from left to right; we can traverse the AST as many times as we want and in any order we want. Finally, I want to add in a future PR a proper Thorin -> Thorin-AST decompilation for a proper output.

Rn, the pipeline looks like this:

  • lex/parse to AST
  • bind: associates ast::IdExpr with its ast::Decl and does some other minor semantic checks
  • emit: AST -> Thorin graph

Changes in the Frontend

  • Parentheses for filters are not mandatory anymore: fn (a: A)@(.tt): B -> fn (a: A)@.tt: B

  • Removed ! for specifying filters. There were only a handful of !s in our code base that we actually needed. Changing those few instances to @.tt isn't a big deal and having two ways for specifying filters just adds confusion and code complexity on our side.

  • Declaration expression: d* e This makes block expressions superfluous as you can simply use a (possibly parenthesized) declaration expression instead. For this reason, the parser emits a warning that block expressions are deprecated. On top of that, you can use declaration expressions in other useful occasions like this:

     .fun .extern f(T: *,
                    U: .let F = %foo.F T; F,
                    V: F
                  )(W: F): F = /*...*/;
    
  • Where expression: e .where d* .end This is a reverse declaration expression: The declarations are bound first (in reverse order), than the expression is bound. Example:

    .fun .extern main(mem: %mem.M, argc: .I32, argv: %mem.Ptr0 (%mem.Ptr0 .I8)): [%mem.M, .I32] =
      loop (mem, 0I32, 0I32)
      .where
          .con loop(mem: %mem.M, i: .I32, acc: .I32) =
              .let cond = %core.icmp.ul (i, argc);
              (exit, body)#cond mem
              .where
                  .con body m: %mem.M =
                      .let inc  = %core.wrap.add 0 (1I32, i);
                      .let `acc = %core.wrap.add 0 (i, acc);
                      loop (m, inc, acc);
                  .con exit(m: %mem.M) = return (m, acc);
              .end
      .end
    
  • .Pi F: T = Π /*...*/ -> .rec F: T = Π /*...*/

  • .Sigma S: T = [/*...*/] -> .rec S: T = [/*...*/]

  • In addition: .rec f = .lm /*...*/ = lam f /*...*/. Same for

    • .cn/.con
    • .fn/.fun
  • .and clause for mutual recursion:

    .lam is_even(i: .Nat): .Bool = (is_odd (%core.nat.sub (i, 1)), .tt)#(%core.ncmp.e (i, 0))
    .and
    .lam is_odd (i: .Nat): .Bool = (is_even(%core.nat.sub (i, 1)), .ff)#(%core.ncmp.e (i, 0));
    

    Or for mutual recursive types:

    .rec T = ... U ...;
    .and U = ... T ...;
    
  • No more forward decls for .lm/.lam, .cn/.con, .fn/.fun or types. Use .let, .where, .rec, .and instead.

  • For declaring C-Functions that you want to link later on use new .ccon/.cfun decl:

    .cfun print_str[%mem.M, %mem.Ptr0 «⊤:.Nat; .I8»]: %mem.M;
    

    Or more low-level:

    .ccon print_str[[%mem.M, %mem.Ptr0 «⊤:.Nat; .I8»], .Cn %mem.M];
    
  • New output: ./thorin --output-ast - that directly emits the AST representation. Here is still work to do but I will address this in future PRs.

Other Changes

  • thorin::fe -> thorin::ast
  • improved error messages
    • colored output
    • compilation proceeds on lexing, parsing, and bind (see above) errors
  • more streamlined way to set Loc of thorin Defs in emit
  • annex/bootstrapping magic moved to AST infrastructure
  • bug fixes + other minor refactoring here and there

leissa avatar Mar 30 '24 15:03 leissa

@NeuralCoder3, @fodinabor: bump

leissa avatar May 07 '24 18:05 leissa

For declaring C-Functions that you want to link later on use new .ccon/.cfun decl:

Why is this necessary? If we no longer have/need forward declarations, why is not every function without a body an external declaration?

NeuralCoder3 avatar May 15 '24 14:05 NeuralCoder3

For declaring C-Functions that you want to link later on use new .ccon/.cfun decl:

Why is this necessary? If we no longer have/need forward declarations, why is not every function without a body an external declaration?

This way, we can better restrict and control what an external declaration is. E.g., we can't have a curried function or a function taking type variables as arguments as external declaration. Note that ccon/cfun sole purpose is to establish a FFI to C/LLVM. I agree that this is a bit ugly at the moment, but right now, it makes the front end simpler and we can still migrate to a more fancy solution - once we have figured out how exactly we want to design our module system.

leissa avatar May 15 '24 20:05 leissa