mosh icon indicating copy to clipboard operation
mosh copied to clipboard

Rust exploration

Open higepon opened this issue 2 years ago • 46 comments

Based on Introduction - Writing Interpreters in Rust: a Guide we explore if it's fun to rewrite Mosh in Rust.

Goals

  • We'll see if it's practically possible to rewrite Mosh in Rust with the current design (Compiler written in Scheme).
  • We'll see if all basic building blocks work.
    • Scheme object with tag bits.
    • GC
    • VM
    • UTF-8
    • File I/O

Non Goals

  • Fully rewrite Mosh.
  • Well designed Rust code.

Milestones

  • M1: Just build the example
  • M2: Define Fixnum and it's predicate
  • M3: Create simple VM to just evaluate constant
  • M4: Update VM to run addtion
  • M5: Come up with a rough idea of how we implement GC.
  • M6: Implement symbol and intern.
  • M7: identify very small Scheme program, compile it in Mosh and execute it in the Rust VM.
  • M8: Understand all the code we wrote except for GC and clean up.
  • M9: Understand how GC works and trigger GC properly. Figure out how to test it.
  • M10: Add tests for GC.
  • M11: Compiler output coverter from scheme to Rust code.
  • M12: Add many simple tests for call and gc.

higepon avatar Nov 24 '22 08:11 higepon

M2 TODOs

  • [x] define and print usize number.
  • [x] print bitwise and for the number
  • [x] define unsafe raw_pointer
  • [x] add method to extract number value from the pointer
  • [x] add predicate method for the raw_pointer
    • https://github.com/higepon/mosh/blob/3bf4b57c6a72282a1f980fed22a048afbfb635e6/rmosh/src/main.rs
  • [x] Revisit https://rust-hosted-langs.github.io/book/chapter-interp-tagged-ptrs.html
  • [x] Define next steps
  • [x] Define ScmObj which covers both Fixnum and object types located in heap.
    • [x] Read the doc.
    • [x] Define empty ScmObj class
    • [x] Make is compatible with the current fixnum.
    • [x] Understand Box
    • [x] Understand how to use heap
    • [x] Have one heap allocated class of ScmObj
    • [x] Have one more heap allocated class of ScmObj
      • https://github.com/higepon/mosh/blob/73ee27ce5be64ee564c8846c0406ce081e6bc674/rmosh/src/main.rs#L27
  • [x] Decidef if Symbol should have String, str, &String or &str.
  • [x] Try to use non default allocator
  • [x] https://rust-hosted-langs.github.io/book/chapter-simple-bump.html
  • [x] Better undersntad GCs
    • [x] https://manishearth.github.io/blog/2015/09/01/designing-a-gc-in-rust/
    • [x] Use https://github.com/Manishearth/rust-gc
  • [ ] Read https://github.com/alilleybrinker/langs-in-rust
    • [ ] Make a list of GC implementations.
    • [ ] https://ceronman.com/2021/07/22/my-experience-crafting-an-interpreter-with-rust/

higepon avatar Nov 24 '22 23:11 higepon

M3 and M4

  • [x] https://github.com/higepon/mosh/blob/dc716dc31b3e406f49161ea3cdfc9a8b4021e44d/rmosh/src/main.rs#L20

higepon avatar Nov 27 '22 02:11 higepon

Need to read this. https://docs.rs/soroban-env-common/0.0.3/src/soroban_env_common/raw_val.rs.html#413

higepon avatar Nov 28 '22 12:11 higepon

M5

  • [x] Write a blog article about how rust_gc works. It should cover
    • [x] how we use it in normal use cases.
    • [x] how it works with supported objects. Both mark and sweep.
    • [x] how we use it with raw pointer use cases.
    • [x] how Gc<Foo> works as Foo when accessing their members.
    • [x] how to write custom trace implementation.
    • [x] how to write custom trace implementation for tag bits pointer
  • [x] Implement the trace implementation for tag bits pointer.
    • [x] Not doing this because it turned out no one is doing custom tag in Rust GC.

higepon avatar Nov 30 '22 10:11 higepon

This is good read but we should not assume this implementation is production quality. https://github.com/ceronman/loxido

higepon avatar Dec 02 '22 23:12 higepon

Looked into this https://higepon.hatenablog.com/entry/2022/12/03/160801.

higepon avatar Dec 03 '22 07:12 higepon

Now I understand how enum in Rust works. I think it maybe okay implementing object system w/o tagged pointers. Some of my tweets about it. https://twitter.com/HigeponJa/status/1600265255317573637

higepon avatar Dec 07 '22 10:12 higepon

M7: identify very small Scheme program, compile it in Mosh and execute it in the Rust VM

  • [x] Just evaluate define and reference.
    • $ gosh vm.scm "compile" "(begin (define a 3) a)" #(CONSTANT 3 DEFINE_GLOBAL a REFER_GLOBAL a HALT)
  • [x] 1 let.
    • $ gosh vm.scm "compile" "(let ([a 3]) a))" #(LET_FRAME 1 CONSTANT 3 PUSH ENTER 1 REFER_LOCAL 0 LEAVE 1 HALT)
  • put closure with free variable
  • [x] gosh vm.scm "compile" "(let ([a 2]) (let ([b 1]) (+ a b)))" #(LET_FRAME 3 CONSTANT 2 PUSH ENTER 1 LET_FRAME 2 REFER_LOCAL 0 PUSH DISPLAY 1 CONSTANT 1 PUSH ENTER 1 REFER_FREE 0 PUSH REFER_LOCAL 0 NUMBER_ADD LEAVE 1 LEAVE 1 HALT)
  • [x] Use if )$ gosh vm.scm "compile" "(if 1 2 3)" #(CONSTANT 1 TEST 5 CONSTANT 2 LOCAL_JMP 3 CONSTANT 3 HALT NOP NOP)
  • [x] call closure $ gosh vm.scm "comp ile" "((lambda (a) (+ a a)) 1)" #(FRAME 21 CONSTANT 1 PUSH CLOSURE 14 1 #f 0 6 (((input string port) 1) lambda a) REFER_LOCAL 0 PUSH REFER_LOCAL 0 NUMBER_ADD RETURN 1 CALL 1 HALT NOP NOP)
  • [x] call procedure written in Rust

higepon avatar Dec 08 '22 08:12 higepon

https://github.com/higepon/mosh/blob/b07ad0a4c8add619bab7c58dae303369f69fbfca/rmosh/src/main.rs

higepon avatar Dec 11 '22 03:12 higepon

M9: Understand how GC works and trigger GC properly. Figure out how to test it.

  • [x] Mimic loxido print debug of gc
  • [x] figure out how to turn on/off the debug print

higepon avatar Dec 13 '22 10:12 higepon

M10: Run many VM tests based off all-tests.scm

  • [x] Finish implementing all core instructions.
    • [x] Stop using hand written instructions
      • [x] Change vm.scm to compile-file and output it to stdout w/o optimization.
      • [x] Write a scheme program to rewrite it Op::Style.
      • [ ] Test it with the existing vm tests.
    • [x] Run compiled all-tests.scm
      • [x] Convert 1 test to Op::
      • [x] Pass the test with the following.
        • [x] Implement write for easier testing
        • [x] Make the free var stub with a dummy lambda + display.
        • [x] Enable args and argc for functions implemented in Rust
        • [x] Add instructions needed.
      • [x]Repeat this. Until we can run most of them.
      • [ ]Update this milestones as we go.
  • [x] Make the free vars same as Mosh
  • [x] Run compiled all-tests.scm
    • Not feasible at this moment because it requires many procs and less important primitives. For example regex procs.
  • [x] Run the compiler in the VM
    • [x] Not feasible at this moment.

higepon avatar Dec 16 '22 08:12 higepon

Some notes for next steps

  • I was very confident with what I was doing because I kept adding tests.
  • Running compiled tests were very efficient to implement VM instructions and procs.
  • The next big milestones are
    • Being able to run the compiler written in Scheme.
    • Parser (This should be done after the compiler).
  • To be able to run the compiler we need
    • Being able to run long program
    • Compiled instructions of the compiler.
    • Base library written in Scheme.
    • Base library written in Rust.
  • Running the compiler is still very far away. What can we do next? Ideas?
    • Run as many test-data.scm tests as possible.

higepon avatar Dec 21 '22 00:12 higepon

M11 Run as many tests in test-data.scm as possible

  • [x] Implements important primitives such as bytevector, vector.
  • [x] Add a way to load a small compiled library
    • [x] Debug weird call bug
    • [x] Change instruction array from Vec<Op> to slice.
    • [x] Mark closure.ops
  • [x] Skip test only if it's too early to implement and add todo comment there.

higepon avatar Dec 21 '22 01:12 higepon

M12 Being able to load small compiled program as baselib

  • [x] What I tried and failed.
    • Compiled the base.scm with gosh vm.scm compile-file-with-macro baselib.scm with for vm-cpp false.
    • Converted the instructions into Rust code (= Vec of Op::).
    • Compile it as a part of test. => The Rust compiler died probably because it's is too big about 60K lines of Rust code.
  • [ ] Next steps
    • [x] Being able to compile small program into Rust code.
      • [x] Pick small scheme code and put in base.scm
      • [x] Compile it to base.op
      • [x] Convert it to Rust program.
      • [x] Test it.
      • [x] Make the process in the Makefile
    • [x] Run the existing tests with optimized code.
      • [x] Implement almost all the Ops in vm.
    • [x] Implement FASL
    • [x] Load the compiler as FASL.
      • [x] It seems loading the compiler stops at some unknown point.
      • [x] Steps to investigate.
        • [x] The whole compiler instructions are loaded as expected.
        • [x] Starting with the last define global, follow all instructions and see if we can catch the issue.

higepon avatar Dec 21 '22 01:12 higepon

M13: Being able to run (compile 3)

  • The compiler output
    • (A) A list of instructions as symbol (default and used in vm.scm).
    • (B) A list of instructions as (*insn* num). We use (*compiler-insn* num) for (PUSH 3) in CONSTANT (PUSH 3) if vm-cpp is on.
    • The conversion is done in insn-sym->insn-num using src/instruction.scm.
  • How the VM is embedding the complier?
    • VM cpp:
      • gen-compiler.scm: -> compiler-vm-cpp.scm
        • cond-expand controls what to include in the compiler.
      • cat all libraries -> baselib.scm
      • gosh vm.scm compile-file-with-macro -> baselib.scmc
      • scmc2fasl.scm -> baselib.fasl
        • Replace insn and compiler-insn to actual Object and write all the file as Fasl.
      • binary2cpp.scm -> baselib.h
      • The VM cpp compiler returns actual insns because of scmc2fasl phase.
    • VM rust:
      • The VM rust
        • should return unflatten insns. To to so the compiler itself should be able to do it.
          • But the output can't be a list of Op yet because the compiler can't produce Op directly yet.
        • should return insn instead of syms
  • Next steps
    • [x] Update OpTag in fasl.rs and use the insn numbers there.
    • [ ] Update fasl_writer.scm and use the insn numbers there.
      • [x] Use baselib.scmc instead of baselib.op as input of fasl_writer.scm
      • [x] Copy fasl_writer as fasl_writer2
      • [x] Update fasl_writer.scm to use baselib.scmc
      • [x] Compare the result with the old output.
      • [x] Commit the fasl_write2 as fasl_writer.
    • [x] At this point we can run some tests to make sure this works.
    • [x] Now we are expected to see the compiler returns insn tag in test_compiler test instad of CONSTANT 0.
    • [x] Support (or a b) in gen-compiler.
    • [x] import modified insn-decl.scm.
    • [x] double check all cond-expand
      • [ ] Verify it in the CI
    • [x] Change compiler to support rmosh
    • [ ] Write compile-rust it wraps compile and use fasl_writer to return #vu8.
    • [ ] To make sure if it's working run the compiler in the vm.scm and compile the compiler and run the diff.
    • [x] Call the compiler from test_compiler.
    • [x] Decide if we need to implement code builder in rust or use the scheme one.
    • [x] Update the VM so that it can handle *insn*
    • [x] Update the compiler so that it can produce unflatten#vu8.
    • [x] Clean up Makefile dependencies.
      • [x] Detect changes in compiler.scm then generate baselib-rust.scmc
      • [x] fasl-write it as compiler.rs
      • [x] This should be triggered in rmosh/boot/Makefile

higepon avatar Jan 03 '23 08:01 higepon

M14: Flatten a list of instructions

Background

I think I made a wrong design choice on how we treat a list of instructions. In C++ Mosh instruction is an object and a list of instruction is a list of object. But in Rust Mosh we treat instruction as enum with value. Such as Constant(3) or Call(2). So they are not Object any more. And they can't be in a list of Object. We found two major downsides.

  • (A) Instruction and object are different Rust type and that complicates the code a lot.
  • (B) We have to convert a list of instruction made by compiler into Enum and adjust offset.

Changes we'll make

  • Compiler instruction is an Object. Specifically Object::Instruction(Op) where Op is Enum w/o operand.
  • Remove all un-flatten code.

higepon avatar Jan 07 '23 12:01 higepon

M15: Simple Reader

higepon avatar Jan 09 '23 05:01 higepon

GC todo

  • Invoking GC in vm.alloc is not good design. It can cause memory error where the allocated object itself is freed because it's not rooted. We should think where to trigger gc.
  • Clean up should_gc logic

higepon avatar Jan 10 '23 23:01 higepon

Now rmosh can read and run a program.

$ cat hoge.scm 
(display ((lambda (a) (+ a 1)) 2))
$ ./target/debug/rmosh hoge.scm 
3

higepon avatar Jan 14 '23 07:01 higepon

Ideas for the next milestones

  • [x] Find missing features by running all-tests.scm
  • [x] Support Flonum
  • [ ] Support Port
  • [x] Support Regex
  • [ ] Implement more VM instructions
  • [ ] Improve GC performance.
  • [ ] Error handling.

higepon avatar Jan 14 '23 23:01 higepon

Now rmosh can run simple R6RS program with some errors :)

(import (rnrs))

(display "Hello")

higepon avatar Jan 21 '23 01:01 higepon

M16

  • Load (import (mosh)).
  • Load (import (mosh file).
  • Enable fasl
  • Port design
    • text port
    • binary port

higepon avatar Jan 22 '23 08:01 higepon

Weird bug on loading psyntax

free_var=source-info called
free_var=procedure? called
thread 'main' panicked at 'Not a Object::Closure but #<vox #<closure 0xaaaaf6956ca0>>', src/objects.rs:238:13

It is happening in parse-library

  (define parse-library
    (lambda (e)
      (format (current-error-port) "parse-library0" )  <==
      (syntax-match e ()

In refer free push, dc is supposed to be closure but was closure.

Next steps

  • [x] commit before the big change
  • [x] Implement writer which can print cyclic reference object.
  • [x] Print stack and vm instructions for the bug and check what's wrong.
  • [x] detect stack overflow
  • [x] support #!r6rs

higepon avatar Jan 24 '23 22:01 higepon

Now rmosh can run the following program.

(import (scheme base))
(import (scheme write))
(import (only (srfi :1) list-ref))
(import (mosh control))
(display 3)
(newline)

higepon avatar Jan 29 '23 10:01 higepon

Now rmosh can automatically serialize library as mosh does.

serialize-library /root/mosh.git/lib/mosh/control.ss
...serialize-library /root/mosh.git/lib/scheme/write.mosh.sls
...serialize-library /root/mosh.git/lib/r7b-impl/write.mosh.sls
...serialize-library /root/mosh.git/lib/scheme/base.mosh.sls
...serialize-library /root/mosh.git/lib/r7b-impl/base.sls
...serialize-library /root/mosh.git/lib/r7b-util/case.sls
...serialize-library /root/mosh.git/lib/r7b-util/syntax-rules.sls
...serialize-library /root/mosh.git/lib/r7b-util/char-ready.sls
...serialize-library /root/mosh.git/lib/r7b-impl/division.sls
...serialize-library /root/mosh.git/lib/srfi/%3a43.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a43/vectors.sls
...serialize-library /root/mosh.git/lib/srfi/%3a13.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a13/strings.sls
...serialize-library /root/mosh.git/lib/srfi/%3a14/char-sets.sls
...serialize-library /root/mosh.git/lib/srfi/private/include.sls
...serialize-library /root/mosh.git/lib/srfi/private/include/compat.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/private/let-opt.sls
...serialize-library /root/mosh.git/lib/srfi/%3a8/receive.sls
...serialize-library /root/mosh.git/lib/srfi/%3a1.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a1/lists.sls
...serialize-library /root/mosh.git/lib/srfi/i39.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a39.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i9.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a9.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a9/records.sls
...serialize-library /root/mosh.git/lib/srfi/i23.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a23.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a23/error.sls
...serialize-library /root/mosh.git/lib/srfi/%3a39/parameters.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i6.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6/basic-string-ports.sls
...serialize-library /root/mosh.git/lib/srfi/%3a6/basic-string-ports/compat.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/i0.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a0.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/%3a0/cond-expand.sls
...serialize-library /root/mosh.git/lib/srfi/private/registry.sls
...serialize-library /root/mosh.git/lib/srfi/private/platform-features.mosh.sls
...serialize-library /root/mosh.git/lib/srfi/private/OS-id-features.sls

higepon avatar Jan 30 '23 22:01 higepon

M17

Start working on R7RS. cargo run ../tests/r7rs/import-all.scm

higepon avatar Jan 30 '23 22:01 higepon

call/cc worked.

(import (rnrs))

(display (call/cc (lambda (c)  (c 3))))
(newline)

higepon avatar Feb 08 '23 23:02 higepon

Anyone interested in writing R7RS procedures in Rust? I'm now trying to run all tests in ../tests/r7rs/r7rs-tests.scm.

We have to implement ~400 procedures in https://github.com/higepon/mosh/blob/bigint/rmosh/src/procs.rs. For example I recently implemented "+" as follows.

fn number_add(vm: &mut Vm, args: &mut [Object]) -> Object {
    let name: &str = "+";
    let argc = args.len();
    if argc == 0 {
        Object::Fixnum(0)
    } else if argc == 1 {
        if args[0].is_number() {
            args[0]
        } else {
            panic!("{}: number required but got {}", name, args[0])
        }
    } else {
        let mut ret = Object::Fixnum(0);
        for arg in args.iter() {
            ret = numbers::add(&mut vm.gc, ret, *arg);
        }
        ret
    }
}

higepon avatar Feb 14 '23 12:02 higepon

I wish i had enough Rust-fu to help you...

(But anyway I'm really positive to switch Mosh implementation to Rust; I need a few months familiarise myself for it though)

okuoku avatar Feb 14 '23 18:02 okuoku

Haha thanks. If you take a closer look. rmosh is just a copy of Mosh.

  • VM instructions
  • Compiler
  • object system are the same.

Anyway ping me when you have some time. So far I implemented 200 procedures and need 400 more :)

higepon avatar Feb 14 '23 22:02 higepon