BorrowScript
BorrowScript copied to clipboard
Implementing the language
Has any work been done yet on actually writing an implementation compiler? I've had some minor experience with writing interpreters and would be interested in helping.
In my opinion, get a sloppy prototype (or subset language) going, with another language as back-end (JS, C#, Java, whatever suits) - essentially neglecting performance and code quality. Then use that as a basis for building a proper, optimized, native, bootstrapped compiler in the language itself. 😎
It makes the most sense for us to write this in Rust because of all the existing tooling: Official borrow checker: https://github.com/rust-lang/polonius TypeScript parser/linter: https://github.com/rslint/rslint
Obviously, having Rust be the host language will make it harder for some to participate until we're self-hosting, but I don't think anyone wants to go through the trouble of reimplementing the borrow checker in another language.
On Mon, Oct 25, 2021, 1:19 PM Rasmus Schultz @.***> wrote:
In my opinion, get a sloppy prototype (or subset language) going, with another language as back-end (JS, C#, Java, whatever suits) - essentially neglecting performance and code quality. Then use that as a basis for building a proper, optimized, native, bootstrapped compiler in the language itself. 😎
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alshdavid/BorrowScript/issues/36#issuecomment-951137729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDQCMTPTSFKW453KAVDCJTUIWGRTANCNFSM5GTPSBWA .
I'm looking at starting to write the language now. I've made a pull request with a sample EBNF gramma for the basics of the language although at the moment it is severely invalid EBNF syntax. I'm looking at implementing the parser using this library: https://docs.rs/chumsky/0.5.0/chumsky/ however I'm also considering using a parser generator that takes an gramma as input and outputs a ready to use parser however I feel like this would lose us control and could break things when adding features to the language. What does everyone think?
I've began a sample language here: https://github.com/Isaac-Leonard/BorrowScriptInterpreter Its written in rust and I've done nothing more then a hello world in rust before this so the code quality isn't the best. Theres also no comments so.... It's currently interpreted however benefits from rusts borrow checker slightly. I don't think reusing existing tooling such as rusts borrow checker or typescript linting tools is going to be easy and probably not worth trying unless they're a lot more flexible than I imagine. I'm not really sure how I'd make this compiled beyond a general idea of converting to assembler and the concepts of memory allocation, I also only currently have a Mac so won't be able to develop much cross platform support into the compiler. Does anyone else have experience with assembler or compilers?
Love your work Isaac! SuperSonicHub1 suggests we use Rust as a backend because it already has all of this tooling built into it - a sensible approach.
So if you imagine the following passes:
BorrowScript Source -> BS AST -> Rust Source -> Binary
This essentially follows the same compilation flow as TypeScript for Javascript. Transpile the source into JavaScript.
Generating AST and transpiling to Rust Source doesn't need to be written in Rust - it can be in another language. Eventually the transpiler can be port to BorrowScript
You're thinking of doing transpilation? That's an interesting idea, but I think writing our own complier will give us more control over the language.
On Wed, Nov 10, 2021, 9:25 PM David Alsh @.***> wrote:
Love your work Isaac! SuperSonicHub1 suggests we use Rust as a backend because it already has all of this tooling built into it - a sensible approach.
So if you imagine the following passes:
BorrowScript Source -> BS AST -> Rust Source -> Binary
This essentially follows the same compilation flow as TypeScript for Javascript. Transpile the source into JavaScript.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alshdavid/BorrowScript/issues/36#issuecomment-965933280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDQCMTQVEFVULU6UC3A6UTULMSTDANCNFSM5GTPSBWA .
The issue with directly transpiling to rust or any other language is that to make use of that languages features like the borrow checker we would have to either let the user directly read that languages error messages or we would have to translate that languages error messages into a form useful for our language. Either option doesn't feel very good in my opinion. For example if we don't implement borrow checking directly and let rust do that for us how do we effectively communicate to the user what went wrong when the rust compiler gives an error that makes no sense in the borrow script program. Instead of tranpiling you could maybe fork the rust compiler directly and make the necessary changes in its parser and standard lib to make something better.
I'm currently planning to initially target WASM with the language I've been writing. I'm not currently sure if what I'm writing will be borrow script as I seem to have a slightly different vision to you but it will be a long time until any of that really matters.
I've now got static analysis working for most trivial uses and have basic unions working along with IO, while loops, if statements and some other bits and pieces but its all still currently interpreted.
You're thinking of doing transpilation? That's an interesting idea, but I think writing our own complier will give us more control over the language.
Fully agree with you but I think it's a practical route initially.
Writing a complete compiler backend is quite the challenge and something I want to do for BS but in the absence of resources I feel a practical starting point is to do a code to code compilation (transpilation) - especially seeing as that requires a large chunk of the effort for a complete compiler anyway (in other words, not wasted effort).
That way we can prove the language semantics work (obviously they will as it's just simplified Rust). It's then easier to appeal for more resources as people would be interested in contributing to a working project.
We would need to build half a transpiler regardless of if we wrote our own backend because of the need to write a parser/AST generator. From there we can pipe the output into whatever.
BorrowScript Source --| Parser |--> BS AST
The parser validates the source code, produces compilation/type checking errors, handles imports and generates a single file (kind of like a js bundle) which is an easy to interpret intermediary format to hand over onto a compiler or transpiler.
# Transpile
BS AST --| Rust Transpiler |--> Rust Source -> Rust Compiler -> Binary
BS AST --| C Transpiler |--> C Source -> gcc -> Binary
# Our own compiler powered by LLVM
BS AST -> LLVM IR -> LLVM -> Binary
# Our own compiler
BS AST -> BS Compiler -> Binary
In all cases, we need to produce an AST. Writing a transpiler for a subset of the language at that point is a great launching point.
Generating AST requires parsing and is not trivial
const foo: string = 'foo'
Would produce something like
{
"file": "main.bs"
"contents": [
{
"token": "foo",
"type": "string",
"assignment": "constant",
"value": "foo"
}
]
}
You can see what I mean here: https://ts-ast-viewer.com/#code/MYewdgzgLgBAZiEAuG0BOBLMBzGBeGAcgREKA
So one application will read source files and produce an object format we can then pipe into another program which will use that structure to make something else.
The second program might be a compiler, or it might convert the object representation into another source code (like Rust).
It's worth noting that doing that means we will get borrow checking for free from the Rust compiler. We just need to parse our source and produce valid Rust.
Again, if it works, people will help out and then we can look at a roadmap where we write our own compiler
either let the user directly read that languages error messages or we would have to translate that languages error messages into a form useful for our language.
This depends on how the parser is implemented.
parser
BorrowScript Source --/ \--> BS AST
You can have the parser be smart, implementing borrow checking and error message generation or you can just produce an AST from the source without caring too much (excluding exceptions) about how that AST is generated.
We should agree on what our AST target will look like so we can share our efforts on writing a parser.
It's worth having a type contract for it so that if we write separate implementations, they will be interchangeable and we won't waste effort
I think being very clear on the segments involved in the compilation process and the expectations of what is generated allows us to produce a modular compiler that's easy to develop in pieces and allows for innovators to find better ways to generate it in the future.
So when we start building out our compiler we have several binaries that make up our main binary
# Generate AST
bsc-parser --output app.bs.ast main.bs
# Generate Rust
bsc-rust --output main.rs app.bs.ast
Then our main compiler does both of these actions
bsc main.bs
rustc main.rs
But this means we can add a module to our compiler that targets wasm
bsc-parser --output app.bs.ast main.bs
bsc-wasm --output main.wasm app.bs.ast
We can then write our own compiler
bsc-binary --target win --arch amd64 --output main.exe app.bs.ast
And each module knows what it's job is. bsc-parser
targets the AST contract and bsc-rust
consumes that contract. So too would bsc-wasm
, bsc-llvm
or bsc-binary
and they can be written independently
@alshdavid Silly but sincere question, do you plan to implement this, or is this repo only a specification for others to implement?
Its been a while sorry, I've started back at work and been working on other projects. As far as I'm aware I'm the only person currently attempting to create a language based off of this, I don't know yet however if I will create it according to the specs in this repo or if I will create a similar but different language. As of a couple of minutes ago I have got a working compiled version of my language completed although it is very limited, it also has an interpreter that is much more featured although I intend to bring the compiler up to the same spec ASAP. This can be found at https://github.com/Isaac-Leonard/BorrowScriptInterpreter The compiler currently just uses JIT provided by LLVM. I initially attempted to write a compiler to target WASM but found the clear resources for it lacking and didn't want to screw around with all the direct byte manipulation if it could be avoided.
For the comments above: I'm not particularly sure how you would go about building a transpiler to rust or another language and be able to use that's languages compiler to give the user errors. It would like be looking at errors from a minified javascript file minified produced after compiling typescript. The only way I really see to take advantage of the existing borrow checker is if it exists in some sort of library form in which case you might be able to target the same AST that rust uses. I still imagine this would be extremely difficult and not provide a good experience for the user however compared to just righting a direct compiler LLVM or WASM or something and implementing our own borrow checker, especially as we want to add rules that differ from both rust and typescripts models.
A shared AST is definitely a good idea eventually, what I've currently got is definitely not usable for it, I need to do a massive refactor soon I think that will change its structure entirely. I don't think its particularly useful however until we have more language rules formalised and there's no undefined behaviour left.
Isn't borrow checking a function of the parser and the resulting AST? It should be statically determinable if an illegal borrow has occurred where it can be called out by the parser. So this wouldn't have to be something passed down to the underlying compiler (ie. rust to determine an illegal borrow happened). Anything that constitutes as valid syntax should provide guarantees that there are no illegal borrows. Part of the metadata of the AST should be what symbols or references are available moving into a given scope and what are left unused (not moved or borrowed) after moving out of that scope. The ones left are the ones that can be garbage collected.
This would mean it would be possible to write a transpiler for any language and still retain the guarantees provided by the borrow checker.
In that case you're practically implementing your own borrow checker though so wouldn't be taking advantage of the existing one.
I've completed a lot more work on my current implementation. I'm hoping to have a first release soon once I clean up most of the bugs and add documentation. It's currently usable in a somewhat limited fashion if you would like to try it.
@Isaac-Leonard increabile work! I have had lots of fun looking through your project's source code - would love to incorporate it into an official compiler.
As far as implementation is concerned, I am ratifying the last few bits of the language semantics and am keen to start working on a compiler.
After I have a very clear picture of the language, I will put up a projects board, set up github actions to automate releases and begin writing a compiler.
Isaac has been writing his implementation in Rust - I was thinking about using Go because it might be easier for contributors (and automating cross compiling the project releases), but Rust is also a great choice.
Hey, thank you I'd advise against using rust actually, at least without using external libraries for graph data structures. Currently I've got scoping analysis implemented very hackily with RC and RefCells but I don't particularly like it. My goal is to make my implementation bootstrapped as fast as possible though so I've not explored alternative options. I initially only chose rust cause I hadn't used it before and wanted to learn it and I found an amazing parsing library in it that read extremely like ts which I was more use to. How clear did you find the code? I've currently only got a few comments and that's mainly just todos lol sorry. I've also not been following any specific algorithms or guides so a lot of stuff is custom and likely to have issues in the future and clarity might be questionable.
Also just letting you know I plan to publish to crates.io in the next week or so. I want to get a LSP server working for it which means splitting it up and publishing I think though I don't intend to start spreading it until memory management is implemented and the bugs are worked out. Given that I'm not sure about continuing with the current borrow script specifications or branching off into my own direction do you have any input with this?
Ok so I've spent a little while learning how LLVM works so I am pretty confident that it's an achievable compile target for the first release. I would like to keep the compiler architecture modular such that it will be open to different compiler backends (compile targets).
This means the compiler will be split into 3 pieces.
flowchart LR
subgraph Front End
A[AST Generator - from BS Source] -->|BS AST| B[LLVM IR Generator]
end
subgraph Back Ends
B --> |LLVM IR| C[Generate Binary from LLVM IR - LLVM Wrapper]
end
This keeps the door open for future projects where the AST generator is reused and alternative backends can be targeted. Might see cool community projects targeting esoteric hardware or producing more optimised binaries.
flowchart LR
subgraph Front End
A[AST Generator - from BS Source] -->|BS AST| B[LLVM IR Generator]
A -->|BS AST| C[C Source Generator]
A -->|BS AST| D[x86/ARM64 Assembly Generator]
end
subgraph Back Ends
B --> |LLVM IR| E[Generate Binary from LLVM IR - LLVM Wrapper]
C --> |C Source| F[Generate Binary from C code - GCC Wrapper]
D --> |x86 Assembly| G[Generate Binary from Assembly - Assembler Wrapper]
end
The compiler will be split into multiple binaries, each responsible for one step of the process. The parent binary will tie them all together.
./bsc
./bs-ast
./bs-front-llvm
./bs-back-llvm
Technically you can run the full sequence
./bs-ast --entry ./main.bs --output ./main.bs.ast
./bs-front-llvm --input ./main.bs.ast --output ./main.llvm
./bs-back-llvm --input ./main.llvm --output ./main --arch amd64 --platform linux
However the bsc
executable will wrap the process, calling each step and orchestrate the chain of events.
bsc --input ./main.bs --output ./main --arch amd64 --platform linux
The reason I want to split this into separate executables is that it allows for clearer separation between the pieces and if one component doesn't work out well in the chosen language - then we can rewrite that component in a better suited language. It also makes it easier for newcomers to understand the flow and find a way to interject in the process.
I am thinking of writing this in Go as it's an easy language to work in and there are llvm bindings for it. In time it would be cool to rewrite the compiler in BorrowScript, once it can be used for that purpose.
My next step is to come up with an unsafe, manually memory managed variant of the language to create a minimal compiler with.
EDIT:
I am also thinking about changing the language name and adding a mascot. BorrowScript is cool but it people have associated it as being an interpreted language - rather than a compiled language.