mosh
mosh copied to clipboard
Rust exploration
Based on Introduction - Writing Interpreters in Rust: a Guide we explore if it's fun to rewrite Mosh in Rust.
Goals
- We'll see if it's practically possible to rewrite Mosh in Rust with the current design (Compiler written in Scheme).
- We'll see if all basic building blocks work.
- Scheme object with tag bits.
- GC
- VM
- UTF-8
- File I/O
Non Goals
- Fully rewrite Mosh.
- Well designed Rust code.
Milestones
- M1: Just build the example
- M2: Define Fixnum and it's predicate
- M3: Create simple VM to just evaluate constant
- M4: Update VM to run addtion
- M5: Come up with a rough idea of how we implement GC.
- M6: Implement symbol and intern.
- M7: identify very small Scheme program, compile it in Mosh and execute it in the Rust VM.
- M8: Understand all the code we wrote except for GC and clean up.
- M9: Understand how GC works and trigger GC properly. Figure out how to test it.
- M10: Add tests for GC.
- M11: Compiler output coverter from scheme to Rust code.
- M12: Add many simple tests for call and gc.
M2 TODOs
- [x] define and print
usizenumber. - [x] print bitwise
andfor the number - [x] define unsafe raw_pointer
- [x] add method to extract number value from the pointer
- [x] add predicate method for the raw_pointer
- https://github.com/higepon/mosh/blob/3bf4b57c6a72282a1f980fed22a048afbfb635e6/rmosh/src/main.rs
- [x] Revisit https://rust-hosted-langs.github.io/book/chapter-interp-tagged-ptrs.html
- [x] Define next steps
- [x] Define ScmObj which covers both Fixnum and object types located in heap.
- [x] Read the doc.
- [x] Define empty ScmObj class
- [x] Make is compatible with the current fixnum.
- [x] Understand Box
- [x] Understand how to use heap
- [x] Have one heap allocated class of ScmObj
- [x] Have one more heap allocated class of ScmObj
- https://github.com/higepon/mosh/blob/73ee27ce5be64ee564c8846c0406ce081e6bc674/rmosh/src/main.rs#L27
- [x] Decidef if Symbol should have String, str, &String or &str.
- [x] Try to use non default allocator
- [x] https://rust-hosted-langs.github.io/book/chapter-simple-bump.html
- [x] Better undersntad GCs
- [x] https://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/
- [x] Use https://github.com/Manishearth/rust-gc
- [ ] Read https://github.com/alilleybrinker/langs-in-rust
- [ ] Make a list of GC implementations.
- [ ] https://ceronman.com/2021/07/22/my-experience-crafting-an-interpreter-with-rust/
M3 and M4
- [x] https://github.com/higepon/mosh/blob/dc716dc31b3e406f49161ea3cdfc9a8b4021e44d/rmosh/src/main.rs#L20
Need to read this. https://docs.rs/soroban-env-common/0.0.3/src/soroban_env_common/raw_val.rs.html#413
M5
- [x] Write a blog article about how rust_gc works. It should cover
- [x] how we use it in normal use cases.
- [x] how it works with supported objects. Both mark and sweep.
- [x] how we use it with raw pointer use cases.
- [x] how Gc<Foo> works as Foo when accessing their members.
- [x] how to write custom trace implementation.
- [x] how to write custom trace implementation for tag bits pointer
- [x] Implement the trace implementation for tag bits pointer.
- [x] Not doing this because it turned out no one is doing custom tag in Rust GC.
This is good read but we should not assume this implementation is production quality. https://github.com/ceronman/loxido
Looked into this https://higepon.hatenablog.com/entry/2022/12/03/160801.
Now I understand how enum in Rust works. I think it maybe okay implementing object system w/o tagged pointers. Some of my tweets about it. https://twitter.com/HigeponJa/status/1600265255317573637
M7: identify very small Scheme program, compile it in Mosh and execute it in the Rust VM
- [x] Just evaluate define and reference.
$ gosh vm.scm "compile" "(begin (define a 3) a)" #(CONSTANT 3 DEFINE_GLOBAL a REFER_GLOBAL a HALT)
- [x] 1 let.
$ gosh vm.scm "compile" "(let ([a 3]) a))" #(LET_FRAME 1 CONSTANT 3 PUSH ENTER 1 REFER_LOCAL 0 LEAVE 1 HALT)
- put closure with free variable
- [x]
gosh vm.scm "compile" "(let ([a 2]) (let ([b 1]) (+ a b)))" #(LET_FRAME 3 CONSTANT 2 PUSH ENTER 1 LET_FRAME 2 REFER_LOCAL 0 PUSH DISPLAY 1 CONSTANT 1 PUSH ENTER 1 REFER_FREE 0 PUSH REFER_LOCAL 0 NUMBER_ADD LEAVE 1 LEAVE 1 HALT) - [x] Use if
)$ gosh vm.scm "compile" "(if 1 2 3)" #(CONSTANT 1 TEST 5 CONSTANT 2 LOCAL_JMP 3 CONSTANT 3 HALT NOP NOP) - [x] call closure
$ gosh vm.scm "comp ile" "((lambda (a) (+ a a)) 1)" #(FRAME 21 CONSTANT 1 PUSH CLOSURE 14 1 #f 0 6 (((input string port) 1) lambda a) REFER_LOCAL 0 PUSH REFER_LOCAL 0 NUMBER_ADD RETURN 1 CALL 1 HALT NOP NOP) - [x] call procedure written in Rust
https://github.com/higepon/mosh/blob/b07ad0a4c8add619bab7c58dae303369f69fbfca/rmosh/src/main.rs
M9: Understand how GC works and trigger GC properly. Figure out how to test it.
- [x] Mimic loxido print debug of gc
- [x] figure out how to turn on/off the debug print
M10: Run many VM tests based off all-tests.scm
- [x] Finish implementing all core instructions.
- [x] Stop using hand written instructions
- [x] Change vm.scm to compile-file and output it to stdout w/o optimization.
- [x] Write a scheme program to rewrite it Op::Style.
- [ ] Test it with the existing vm tests.
- [x] Run compiled all-tests.scm
- [x] Convert 1 test to Op::
- [x] Pass the test with the following.
- [x] Implement write for easier testing
- [x] Make the free var stub with a dummy lambda + display.
- [x] Enable args and argc for functions implemented in Rust
- [x] Add instructions needed.
- [x]Repeat this. Until we can run most of them.
- [ ]Update this milestones as we go.
- [x] Stop using hand written instructions
- [x] Make the free vars same as Mosh
- [x] Run compiled all-tests.scm
- Not feasible at this moment because it requires many procs and less important primitives. For example regex procs.
- [x] Run the compiler in the VM
- [x] Not feasible at this moment.
Some notes for next steps
- I was very confident with what I was doing because I kept adding tests.
- Running compiled tests were very efficient to implement VM instructions and procs.
- The next big milestones are
- Being able to run the compiler written in Scheme.
- Parser (This should be done after the compiler).
- To be able to run the compiler we need
- Being able to run long program
- Compiled instructions of the compiler.
- Base library written in Scheme.
- Base library written in Rust.
- Running the compiler is still very far away. What can we do next? Ideas?
- Run as many test-data.scm tests as possible.
M11 Run as many tests in test-data.scm as possible
- [x] Implements important primitives such as bytevector, vector.
- [x] Add a way to load a small compiled library
- [x] Debug weird call bug
- [x] Change instruction array from
Vec<Op>to slice. - [x] Mark closure.ops
- [x] Skip test only if it's too early to implement and add todo comment there.
M12 Being able to load small compiled program as baselib
- [x] What I tried and failed.
- Compiled the base.scm with
gosh vm.scm compile-file-with-macro baselib.scmwith for vm-cpp false. - Converted the instructions into Rust code (= Vec of Op::).
- Compile it as a part of test. => The Rust compiler died probably because it's is too big about 60K lines of Rust code.
- Compiled the base.scm with
- [ ] Next steps
- [x] Being able to compile small program into Rust code.
- [x] Pick small scheme code and put in base.scm
- [x] Compile it to base.op
- [x] Convert it to Rust program.
- [x] Test it.
- [x] Make the process in the Makefile
- [x] Run the existing tests with optimized code.
- [x] Implement almost all the Ops in vm.
- [x] Implement FASL
- [x] Load the compiler as FASL.
- [x] It seems loading the compiler stops at some unknown point.
- [x] Steps to investigate.
- [x] The whole compiler instructions are loaded as expected.
- [x] Starting with the last define global, follow all instructions and see if we can catch the issue.
- [x] Being able to compile small program into Rust code.
M13: Being able to run (compile 3)
- The compiler output
- (A) A list of instructions as symbol (default and used in vm.scm).
- (B) A list of instructions as
(*insn* num). We use(*compiler-insn* num)for(PUSH 3)inCONSTANT (PUSH 3)if vm-cpp is on. - The conversion is done in
insn-sym->insn-numusing src/instruction.scm.
- How the VM is embedding the complier?
- VM cpp:
- gen-compiler.scm: -> compiler-vm-cpp.scm
- cond-expand controls what to include in the compiler.
- cat all libraries -> baselib.scm
gosh vm.scm compile-file-with-macro-> baselib.scmcscmc2fasl.scm->baselib.fasl- Replace insn and compiler-insn to actual Object and write all the file as Fasl.
binary2cpp.scm->baselib.h- The VM cpp compiler returns actual insns because of scmc2fasl phase.
- gen-compiler.scm: -> compiler-vm-cpp.scm
- VM rust:
- The VM rust
- should return unflatten insns. To to so the compiler itself should be able to do it.
- But the output can't be a list of Op yet because the compiler can't produce Op directly yet.
- should return insn instead of syms
- should return unflatten insns. To to so the compiler itself should be able to do it.
- The VM rust
- VM cpp:
- Next steps
- [x] Update OpTag in fasl.rs and use the insn numbers there.
- [ ] Update fasl_writer.scm and use the insn numbers there.
- [x] Use baselib.scmc instead of baselib.op as input of fasl_writer.scm
- [x] Copy fasl_writer as fasl_writer2
- [x] Update fasl_writer.scm to use baselib.scmc
- [x] Compare the result with the old output.
- [x] Commit the fasl_write2 as fasl_writer.
- [x] At this point we can run some tests to make sure this works.
- [x] Now we are expected to see the compiler returns insn tag in test_compiler test instad of
CONSTANT 0. - [x] Support (or a b) in gen-compiler.
- [x] import modified insn-decl.scm.
- [x] double check all cond-expand
- [ ] Verify it in the CI
- [x] Change compiler to support rmosh
- [ ] Write compile-rust it wraps compile and use fasl_writer to return #vu8.
- [ ] To make sure if it's working run the compiler in the vm.scm and compile the compiler and run the diff.
- [x] Call the compiler from test_compiler.
- [x] Decide if we need to implement code builder in rust or use the scheme one.
- [x] Update the VM so that it can handle
*insn* - [x] Update the compiler so that it can produce unflatten
#vu8. - [x] Clean up Makefile dependencies.
- [x] Detect changes in compiler.scm then generate baselib-rust.scmc
- [x] fasl-write it as compiler.rs
- [x] This should be triggered in rmosh/boot/Makefile
M14: Flatten a list of instructions
Background
I think I made a wrong design choice on how we treat a list of instructions. In C++ Mosh instruction is an object and a list of instruction is a list of object. But in Rust Mosh we treat instruction as enum with value. Such as Constant(3) or Call(2). So they are not Object any more. And they can't be in a list of Object. We found two major downsides.
- (A) Instruction and object are different Rust type and that complicates the code a lot.
- (B) We have to convert a list of instruction made by compiler into Enum and adjust offset.
Changes we'll make
- Compiler instruction is an Object. Specifically Object::Instruction(Op) where Op is Enum w/o operand.
- Remove all un-flatten code.
M15: Simple Reader
- [ ] Investigate parser & helpers.
- [ ] Found re2c.
- [ ] Read re2c manual
- [ ] Introduction
- Notes
- single quote and double quote mean something different.
- [x] Syntax
- [x] Program interface
- [x] Options
- [x] Warnings
- [x] Blocks and directives
- [x] API primitives
- [x] Configurations
- [x] Regular expressions
- [x] Handling the end of input
- [x] Sentinel
- [x] Sentinel with bounds checks
- [x] Bounds checks with padding
- [ ] Custom checks
- [ ] Buffer refilling
- [ ] YYFILL with sentinel
- [ ] YYFILL with padding
- [ ] Multiple blocks
- [ ] Start conditions
- [ ] Storable state
- [ ] Reusable blocks
- [ ] Submatch extraction
- [ ] Encoding support
- [ ] Include files
- [ ] Header files
- [ ] Skeleton programs
- [ ] Visualization and debug
- [ ] More examples
- Notes
- [ ] Scanner
- [x] Have been playing with LALRPOP, but it turns out its lexer is a toy. So using LRLRPOP's lexer is not our option. Rust Python is using LRLRPOP but not using the lexer. https://github.com/RustPython/RustPython/blob/main/compiler/parser/python.lalrpop. It has its hand-written lexer. https://github.com/RustPython/RustPython/blob/main/compiler/parser/src/lexer.rs
- [x] Set up re2c based scanner
- [x] scan number
- [x] Just scan #t and return something
- [ ]
- [x] Come up with Scanner API.
- [x] scan identifier
- [x] come up with next steps
- [x] Decide what to you.
- [x] Read number
- [x] Read symbol
- [x] Read string
- [x] Read dot pair
- [x] Read pair
- [x] Read vector
- [x] Read character
- [x] And more :)
GC todo
- Invoking GC in vm.alloc is not good design. It can cause memory error where the allocated object itself is freed because it's not rooted. We should think where to trigger gc.
- Clean up should_gc logic
Now rmosh can read and run a program.
$ cat hoge.scm
(display ((lambda (a) (+ a 1)) 2))
$ ./target/debug/rmosh hoge.scm
3
Ideas for the next milestones
- [x] Find missing features by running all-tests.scm
- [x] Support Flonum
- [ ] Support Port
- [x] Support Regex
- [ ] Implement more VM instructions
- [ ] Improve GC performance.
- [ ] Error handling.
Now rmosh can run simple R6RS program with some errors :)
(import (rnrs))
(display "Hello")
M16
- Load
(import (mosh)). - Load
(import (mosh file). - Enable
fasl - Port design
- text port
- binary port
Weird bug on loading psyntax
free_var=source-info called
free_var=procedure? called
thread 'main' panicked at 'Not a Object::Closure but #<vox #<closure 0xaaaaf6956ca0>>', src/objects.rs:238:13
It is happening in parse-library
(define parse-library
(lambda (e)
(format (current-error-port) "parse-library0" ) <==
(syntax-match e ()
In refer free push, dc is supposed to be closure but was closure.
Next steps
- [x] commit before the big change
- [x] Implement writer which can print cyclic reference object.
- [x] Print stack and vm instructions for the bug and check what's wrong.
- [x] detect stack overflow
- [x] support #!r6rs
Now rmosh can run the following program.
(import (scheme base))
(import (scheme write))
(import (only (srfi :1) list-ref))
(import (mosh control))
(display 3)
(newline)
Now rmosh can automatically serialize library as mosh does.
serialize-library /root/mosh.git/lib/mosh/control.ss
...serialize-library /root/mosh.git/lib/scheme/write.mosh.sls
...serialize-library /root/mosh.git/lib/r7b-impl/write.mosh.sls
...serialize-library /root/mosh.git/lib/scheme/base.mosh.sls
...serialize-library /root/mosh.git/lib/r7b-impl/base.sls
...serialize-library /root/mosh.git/lib/r7b-util/case.sls
...serialize-library /root/mosh.git/lib/r7b-util/syntax-rules.sls
...serialize-library /root/mosh.git/lib/r7b-util/char-ready.sls
...serialize-library /root/mosh.git/lib/r7b-impl/division.sls
...serialize-library /root/mosh.git/lib/srfi/%3a43.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a43/vectors.sls
...serialize-library /root/mosh.git/lib/srfi/%3a13.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a13/strings.sls
...serialize-library /root/mosh.git/lib/srfi/%3a14/char-sets.sls
...serialize-library /root/mosh.git/lib/srfi/private/include.sls
...serialize-library /root/mosh.git/lib/srfi/private/include/compat.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/private/let-opt.sls
...serialize-library /root/mosh.git/lib/srfi/%3a8/receive.sls
...serialize-library /root/mosh.git/lib/srfi/%3a1.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a1/lists.sls
...serialize-library /root/mosh.git/lib/srfi/i39.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a39.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i9.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a9.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a9/records.sls
...serialize-library /root/mosh.git/lib/srfi/i23.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a23.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a23/error.sls
...serialize-library /root/mosh.git/lib/srfi/%3a39/parameters.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i6.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6/basic-string-ports.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6/basic-string-ports/compat.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i0.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a0.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a0/cond-expand.sls
...serialize-library /root/mosh.git/lib/srfi/private/registry.sls
...serialize-library /root/mosh.git/lib/srfi/private/platform-features.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/private/OS-id-features.sls
M17
Start working on R7RS.
cargo run ../tests/r7rs/import-all.scm
call/cc worked.
(import (rnrs))
(display (call/cc (lambda (c) (c 3))))
(newline)
Anyone interested in writing R7RS procedures in Rust? I'm now trying to run all tests in ../tests/r7rs/r7rs-tests.scm.
We have to implement ~400 procedures in https://github.com/higepon/mosh/blob/bigint/rmosh/src/procs.rs. For example I recently implemented "+" as follows.
fn number_add(vm: &mut Vm, args: &mut [Object]) -> Object {
let name: &str = "+";
let argc = args.len();
if argc == 0 {
Object::Fixnum(0)
} else if argc == 1 {
if args[0].is_number() {
args[0]
} else {
panic!("{}: number required but got {}", name, args[0])
}
} else {
let mut ret = Object::Fixnum(0);
for arg in args.iter() {
ret = numbers::add(&mut vm.gc, ret, *arg);
}
ret
}
}
I wish i had enough Rust-fu to help you...
(But anyway I'm really positive to switch Mosh implementation to Rust; I need a few months familiarise myself for it though)
Haha thanks. If you take a closer look. rmosh is just a copy of Mosh.
- VM instructions
- Compiler
- object system are the same.
Anyway ping me when you have some time. So far I implemented 200 procedures and need 400 more :)