BorrowScript Implementing the language

Has any work been done yet on actually writing an implementation compiler? I've had some minor experience with writing interpreters and would be interested in helping.

Oct 24 '21 13:10 Isaac-Leonard

In my opinion, get a sloppy prototype (or subset language) going, with another language as back-end (JS, C#, Java, whatever suits) - essentially neglecting performance and code quality. Then use that as a basis for building a proper, optimized, native, bootstrapped compiler in the language itself. 😎

Oct 25 '21 17:10 mindplay-dk

It makes the most sense for us to write this in Rust because of all the existing tooling: Official borrow checker: https://github.com/rust-lang/polonius TypeScript parser/linter: https://github.com/rslint/rslint

Obviously, having Rust be the host language will make it harder for some to participate until we're self-hosting, but I don't think anyone wants to go through the trouble of reimplementing the borrow checker in another language.

On Mon, Oct 25, 2021, 1:19 PM Rasmus Schultz @.***> wrote:

In my opinion, get a sloppy prototype (or subset language) going, with another language as back-end (JS, C#, Java, whatever suits) - essentially neglecting performance and code quality. Then use that as a basis for building a proper, optimized, native, bootstrapped compiler in the language itself. 😎

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alshdavid/BorrowScript/issues/36#issuecomment-951137729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDQCMTPTSFKW453KAVDCJTUIWGRTANCNFSM5GTPSBWA .

Oct 25 '21 19:10 SuperSonicHub1

I'm looking at starting to write the language now. I've made a pull request with a sample EBNF gramma for the basics of the language although at the moment it is severely invalid EBNF syntax. I'm looking at implementing the parser using this library: https://docs.rs/chumsky/0.5.0/chumsky/ however I'm also considering using a parser generator that takes an gramma as input and outputs a ready to use parser however I feel like this would lose us control and could break things when adding features to the language. What does everyone think?

Nov 01 '21 05:11 Isaac-Leonard

I've began a sample language here: https://github.com/Isaac-Leonard/BorrowScriptInterpreter Its written in rust and I've done nothing more then a hello world in rust before this so the code quality isn't the best. Theres also no comments so.... It's currently interpreted however benefits from rusts borrow checker slightly. I don't think reusing existing tooling such as rusts borrow checker or typescript linting tools is going to be easy and probably not worth trying unless they're a lot more flexible than I imagine. I'm not really sure how I'd make this compiled beyond a general idea of converting to assembler and the concepts of memory allocation, I also only currently have a Mac so won't be able to develop much cross platform support into the compiler. Does anyone else have experience with assembler or compilers?

Nov 09 '21 15:11 Isaac-Leonard

Love your work Isaac! SuperSonicHub1 suggests we use Rust as a backend because it already has all of this tooling built into it - a sensible approach.

So if you imagine the following passes:

BorrowScript Source -> BS AST -> Rust Source -> Binary

This essentially follows the same compilation flow as TypeScript for Javascript. Transpile the source into JavaScript.

Generating AST and transpiling to Rust Source doesn't need to be written in Rust - it can be in another language. Eventually the transpiler can be port to BorrowScript

Nov 11 '21 02:11 alshdavid

You're thinking of doing transpilation? That's an interesting idea, but I think writing our own complier will give us more control over the language.

On Wed, Nov 10, 2021, 9:25 PM David Alsh @.***> wrote:

Love your work Isaac! SuperSonicHub1 suggests we use Rust as a backend because it already has all of this tooling built into it - a sensible approach.

So if you imagine the following passes:

BorrowScript Source -> BS AST -> Rust Source -> Binary

This essentially follows the same compilation flow as TypeScript for Javascript. Transpile the source into JavaScript.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alshdavid/BorrowScript/issues/36#issuecomment-965933280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDQCMTQVEFVULU6UC3A6UTULMSTDANCNFSM5GTPSBWA .

Nov 11 '21 13:11 SuperSonicHub1

The issue with directly transpiling to rust or any other language is that to make use of that languages features like the borrow checker we would have to either let the user directly read that languages error messages or we would have to translate that languages error messages into a form useful for our language. Either option doesn't feel very good in my opinion. For example if we don't implement borrow checking directly and let rust do that for us how do we effectively communicate to the user what went wrong when the rust compiler gives an error that makes no sense in the borrow script program. Instead of tranpiling you could maybe fork the rust compiler directly and make the necessary changes in its parser and standard lib to make something better.

I'm currently planning to initially target WASM with the language I've been writing. I'm not currently sure if what I'm writing will be borrow script as I seem to have a slightly different vision to you but it will be a long time until any of that really matters.

I've now got static analysis working for most trivial uses and have basic unions working along with IO, while loops, if statements and some other bits and pieces but its all still currently interpreted.

Nov 13 '21 11:11 Isaac-Leonard

You're thinking of doing transpilation? That's an interesting idea, but I think writing our own complier will give us more control over the language.

Fully agree with you but I think it's a practical route initially.

Writing a complete compiler backend is quite the challenge and something I want to do for BS but in the absence of resources I feel a practical starting point is to do a code to code compilation (transpilation) - especially seeing as that requires a large chunk of the effort for a complete compiler anyway (in other words, not wasted effort).

That way we can prove the language semantics work (obviously they will as it's just simplified Rust). It's then easier to appeal for more resources as people would be interested in contributing to a working project.

We would need to build half a transpiler regardless of if we wrote our own backend because of the need to write a parser/AST generator. From there we can pipe the output into whatever.

BorrowScript Source --| Parser |--> BS AST

The parser validates the source code, produces compilation/type checking errors, handles imports and generates a single file (kind of like a js bundle) which is an easy to interpret intermediary format to hand over onto a compiler or transpiler.

# Transpile
BS AST --| Rust Transpiler |--> Rust Source -> Rust Compiler -> Binary
BS AST --| C Transpiler |--> C Source -> gcc -> Binary

# Our own compiler powered by LLVM
BS AST -> LLVM IR -> LLVM -> Binary

# Our own compiler
BS AST -> BS Compiler -> Binary

In all cases, we need to produce an AST. Writing a transpiler for a subset of the language at that point is a great launching point.

Generating AST requires parsing and is not trivial

const foo: string = 'foo'

Would produce something like

{
  "file": "main.bs"
  "contents": [
    {
       "token": "foo",
       "type": "string",
       "assignment": "constant",
       "value": "foo"
    }
  ]
}

You can see what I mean here: https://ts-ast-viewer.com/#code/MYewdgzgLgBAZiEAuG0BOBLMBzGBeGAcgREKA

So one application will read source files and produce an object format we can then pipe into another program which will use that structure to make something else.

The second program might be a compiler, or it might convert the object representation into another source code (like Rust).

It's worth noting that doing that means we will get borrow checking for free from the Rust compiler. We just need to parse our source and produce valid Rust.

Again, if it works, people will help out and then we can look at a roadmap where we write our own compiler

Nov 14 '21 23:11 alshdavid

either let the user directly read that languages error messages or we would have to translate that languages error messages into a form useful for our language.

This depends on how the parser is implemented.

                        parser
BorrowScript Source --/        \--> BS AST

You can have the parser be smart, implementing borrow checking and error message generation or you can just produce an AST from the source without caring too much (excluding exceptions) about how that AST is generated.

Nov 15 '21 00:11 alshdavid

We should agree on what our AST target will look like so we can share our efforts on writing a parser.

It's worth having a type contract for it so that if we write separate implementations, they will be interchangeable and we won't waste effort

I think being very clear on the segments involved in the compilation process and the expectations of what is generated allows us to produce a modular compiler that's easy to develop in pieces and allows for innovators to find better ways to generate it in the future.

So when we start building out our compiler we have several binaries that make up our main binary

# Generate AST
bsc-parser --output app.bs.ast main.bs

# Generate Rust
bsc-rust --output main.rs app.bs.ast

Then our main compiler does both of these actions

bsc main.bs
rustc main.rs

But this means we can add a module to our compiler that targets wasm

bsc-parser --output app.bs.ast main.bs
bsc-wasm --output main.wasm app.bs.ast

We can then write our own compiler

bsc-binary --target win --arch amd64 --output main.exe app.bs.ast

And each module knows what it's job is. bsc-parser targets the AST contract and bsc-rust consumes that contract. So too would bsc-wasm, bsc-llvm or bsc-binary and they can be written independently

Nov 15 '21 00:11 alshdavid

@alshdavid Silly but sincere question, do you plan to implement this, or is this repo only a specification for others to implement?

Jan 04 '22 20:01 NOP0

Its been a while sorry, I've started back at work and been working on other projects. As far as I'm aware I'm the only person currently attempting to create a language based off of this, I don't know yet however if I will create it according to the specs in this repo or if I will create a similar but different language. As of a couple of minutes ago I have got a working compiled version of my language completed although it is very limited, it also has an interpreter that is much more featured although I intend to bring the compiler up to the same spec ASAP. This can be found at https://github.com/Isaac-Leonard/BorrowScriptInterpreter The compiler currently just uses JIT provided by LLVM. I initially attempted to write a compiler to target WASM but found the clear resources for it lacking and didn't want to screw around with all the direct byte manipulation if it could be avoided.

For the comments above: I'm not particularly sure how you would go about building a transpiler to rust or another language and be able to use that's languages compiler to give the user errors. It would like be looking at errors from a minified javascript file minified produced after compiling typescript. The only way I really see to take advantage of the existing borrow checker is if it exists in some sort of library form in which case you might be able to target the same AST that rust uses. I still imagine this would be extremely difficult and not provide a good experience for the user however compared to just righting a direct compiler LLVM or WASM or something and implementing our own borrow checker, especially as we want to add rules that differ from both rust and typescripts models.

A shared AST is definitely a good idea eventually, what I've currently got is definitely not usable for it, I need to do a massive refactor soon I think that will change its structure entirely. I don't think its particularly useful however until we have more language rules formalised and there's no undefined behaviour left.

Jan 17 '22 14:01 Isaac-Leonard

Isn't borrow checking a function of the parser and the resulting AST? It should be statically determinable if an illegal borrow has occurred where it can be called out by the parser. So this wouldn't have to be something passed down to the underlying compiler (ie. rust to determine an illegal borrow happened). Anything that constitutes as valid syntax should provide guarantees that there are no illegal borrows. Part of the metadata of the AST should be what symbols or references are available moving into a given scope and what are left unused (not moved or borrowed) after moving out of that scope. The ones left are the ones that can be garbage collected.

This would mean it would be possible to write a transpiler for any language and still retain the guarantees provided by the borrow checker.

Jan 21 '22 01:01 johnrcui

In that case you're practically implementing your own borrow checker though so wouldn't be taking advantage of the existing one.

I've completed a lot more work on my current implementation. I'm hoping to have a first release soon once I clean up most of the bugs and add documentation. It's currently usable in a somewhat limited fashion if you would like to try it.

Feb 27 '22 14:02 Isaac-Leonard

@Isaac-Leonard increabile work! I have had lots of fun looking through your project's source code - would love to incorporate it into an official compiler.

As far as implementation is concerned, I am ratifying the last few bits of the language semantics and am keen to start working on a compiler.

After I have a very clear picture of the language, I will put up a projects board, set up github actions to automate releases and begin writing a compiler.

Isaac has been writing his implementation in Rust - I was thinking about using Go because it might be easier for contributors (and automating cross compiling the project releases), but Rust is also a great choice.

Apr 03 '22 23:04 alshdavid

Hey, thank you I'd advise against using rust actually, at least without using external libraries for graph data structures. Currently I've got scoping analysis implemented very hackily with RC and RefCells but I don't particularly like it. My goal is to make my implementation bootstrapped as fast as possible though so I've not explored alternative options. I initially only chose rust cause I hadn't used it before and wanted to learn it and I found an amazing parsing library in it that read extremely like ts which I was more use to. How clear did you find the code? I've currently only got a few comments and that's mainly just todos lol sorry. I've also not been following any specific algorithms or guides so a lot of stuff is custom and likely to have issues in the future and clarity might be questionable.

Apr 04 '22 14:04 Isaac-Leonard

Also just letting you know I plan to publish to crates.io in the next week or so. I want to get a LSP server working for it which means splitting it up and publishing I think though I don't intend to start spreading it until memory management is implemented and the bugs are worked out. Given that I'm not sure about continuing with the current borrow script specifications or branching off into my own direction do you have any input with this?

Apr 04 '22 16:04 Isaac-Leonard

Ok so I've spent a little while learning how LLVM works so I am pretty confident that it's an achievable compile target for the first release. I would like to keep the compiler architecture modular such that it will be open to different compiler backends (compile targets).

This means the compiler will be split into 3 pieces.

flowchart LR
    subgraph Front End
    A[AST Generator - from BS Source] -->|BS AST| B[LLVM IR Generator]
    end
    subgraph Back Ends
    B --> |LLVM IR| C[Generate Binary from LLVM IR - LLVM Wrapper]
    end

This keeps the door open for future projects where the AST generator is reused and alternative backends can be targeted. Might see cool community projects targeting esoteric hardware or producing more optimised binaries.

flowchart LR
    subgraph Front End
    A[AST Generator - from BS Source] -->|BS AST| B[LLVM IR Generator]
    A -->|BS AST| C[C Source Generator]
    A -->|BS AST| D[x86/ARM64 Assembly Generator]
    end
    subgraph Back Ends
    B --> |LLVM IR| E[Generate Binary from LLVM IR - LLVM Wrapper]
    C --> |C Source| F[Generate Binary from C code - GCC Wrapper]
    D --> |x86 Assembly| G[Generate Binary from Assembly - Assembler Wrapper]
    end

The compiler will be split into multiple binaries, each responsible for one step of the process. The parent binary will tie them all together.

./bsc
./bs-ast
./bs-front-llvm
./bs-back-llvm

Technically you can run the full sequence

./bs-ast --entry ./main.bs --output ./main.bs.ast
./bs-front-llvm --input ./main.bs.ast --output ./main.llvm
./bs-back-llvm --input ./main.llvm --output ./main --arch amd64 --platform linux

However the bsc executable will wrap the process, calling each step and orchestrate the chain of events.

bsc --input ./main.bs --output ./main --arch amd64 --platform linux

The reason I want to split this into separate executables is that it allows for clearer separation between the pieces and if one component doesn't work out well in the chosen language - then we can rewrite that component in a better suited language. It also makes it easier for newcomers to understand the flow and find a way to interject in the process.

I am thinking of writing this in Go as it's an easy language to work in and there are llvm bindings for it. In time it would be cool to rewrite the compiler in BorrowScript, once it can be used for that purpose.

My next step is to come up with an unsafe, manually memory managed variant of the language to create a minimal compiler with.

EDIT:

I am also thinking about changing the language name and adding a mascot. BorrowScript is cool but it people have associated it as being an interpreted language - rather than a compiled language.

Jul 31 '22 01:07 alshdavid

BorrowScript BorrowScript copied to clipboard

Implementing the language

BorrowScript
BorrowScript copied to clipboard