rustc_codegen_cranelift In-place binary/jit updating

Loris Cro and Andrew Kelley recently published a blog post (https://kristoff.it/blog/zig-new-relationship-llvm/) about Ziglang's new self-hosted compiler. Part of the post is about how they can do in-place patching of the binary to avoid doing linking when changing small amounts of code (or maybe larger amounts too, I'm not sure). This seems extremely powerful. Linking is consistently a significant bottleneck in compiling rust, and completely skipping it would probably result in crazy speedups for compiling.

I'm proposing that this project pursue this either in a similar way to Zig, by patching individual functions in the output binary, or by keeping the jit open and recompiling individual functions as the source changes.

Sep 28 '20 13:09 lachlansneff

Thanks for pointing me to that post! Binary patching is probably hard to implement. Doing this for the JIT mode of cg_clif is much more possible. cranelift-simplejit currently doesn't support hot code swapping. I already wanted to implement this for lazy compilation in JIT mode though. Rustc doesn't have a DepNode that I can use to implement incremental compilation of single mono items, but that shouldn't be too hard to implement. https://github.com/rust-lang/rust/pull/76474 will make it possible to create a custom rustc driver that would listen for file changes and pass the necessary persistent state to the various runs of the codegen backend.

Sep 28 '20 14:09 bjorn3

@bjorn3 That's fantastic, I'm glad to hear that the prerequisite components for this to work are mostly already on the roadmap.

As for the binary reloading, I don't think it'd actually be too bad, since the way that Zig does this is by including a function lookup table in the binary that can be modified very easily to redirect functions to new versions. It would require logic for each output format (mach, elf, pe, wasm, etc) though.

I am more immediately excited for a jit mode, since continuous in-memory compilation would be faster than touching disk anyhow.

Sep 28 '20 14:09 lachlansneff

I have been trying to revive an old branch for lazy compilation in jit mode. Currently the following program sometimes panics. (about one in five tries I think)

#![feature(
    no_core, start, lang_items, box_syntax, never_type, linkage,
    extern_types, thread_local
)]
#![no_core]
#![allow(dead_code, non_camel_case_types)]

extern crate mini_core;

use mini_core::*;
use mini_core::libc::*;

unsafe extern "C" fn my_puts(s: *const i8) {
    puts(s);
}

#[lang = "termination"]
trait Termination {
    fn report(self) -> i32;
}

impl Termination for () {
    fn report(self) -> i32 {
        unsafe {
            0
        }
    }
}

#[lang = "start"]
fn start<T: Termination + 'static>(
    main: fn() -> T,
    argc: isize,
    argv: *const *const u8,
) -> isize {
    main().report();
    0
}

macro_rules! assert_eq {
    ($l:expr, $r: expr) => {
        if $l != $r {
            panic(stringify!($l != $r));
        }
    }
}

struct Unique<T: ?Sized> {
    pointer: *const T,
    _marker: PhantomData<T>,
}

impl<T: ?Sized, U: ?Sized> CoerceUnsized<Unique<U>> for Unique<T> where T: Unsize<U> {}

fn take_unique(_u: Unique<()>) {}

fn main() {
    take_unique(Unique {
        pointer: 0 as *const (),
        _marker: PhantomData,
    });

    extern {
        #[linkage = "extern_weak"]
        static ABC: *const u8;
    }

    {
        extern {
            #[linkage = "extern_weak"]
            static ABC: *const u8;
        }
    }

    unsafe { assert_eq!(ABC as usize, 0); }
}

Oct 11 '20 19:10 bjorn3

Found the problem. I wasn't correctly saving definitions of weak linkage statics.

Oct 12 '20 09:10 bjorn3

Currently lazy jit compilation is significantly slower. Probably because of the fact that once a function is jitted, previous references of it still go through the compilation shim.

Benchmark #1: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu
  Time (mean ± σ):      1.476 s ±  0.059 s    [User: 1.412 s, System: 0.062 s]
  Range (min … max):    1.435 s …  1.632 s    10 runs
 
Benchmark #2: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu &&  ./target/out/std_example arg
  Time (mean ± σ):     650.3 ms ±  14.7 ms    [User: 553.7 ms, System: 96.4 ms]
  Range (min … max):   639.6 ms … 687.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  '/home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu &&  ./target/out/std_example arg' ran
    2.27 ± 0.10 times faster than '/home/bjorn/Documenten/cg_clif3/target/release/cg_clif  -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu'

This should be fixable using a GOT, which would also help with other kinds of function replacement, like the in-place jit updating proposed by this issue.

Oct 12 '20 10:10 bjorn3

Implemented a GOT+PLT for SimpleJIT. It mostly works, but crashes for mutable static writes. (haven't tried reads)

#![feature(
    no_core, start, lang_items, box_syntax, never_type, linkage,
    extern_types, thread_local
)]
#![no_core]

extern crate mini_core;

#[lang = "start"]
fn start(main: fn(), argc: isize, argv: *const *const u8) -> isize {
    unsafe {
        NUM = 43;
    }
    0
}

static mut NUM: u8 = 6 * 7;

fn main() {}

(gdb) disassemble 0x7fffe807e040,+0x22
Dump of assembler code from 0x7fffe807e040 to 0x7fffe807e062:
   0x00007fffe807e040:  rex push %rbp
   0x00007fffe807e042:  mov    %rsp,%rbp
   0x00007fffe807e045:  mov    -0x202c(%rip),%rax        # 0x7fffe807c020
   0x00007fffe807e04c:  rex mov $0x2b,%ecx
   0x00007fffe807e052:  movzbl %cl,%ecx
=> 0x00007fffe807e056:  mov    %cl,(%rax)
   0x00007fffe807e059:  rex mov $0x0,%eax
   0x00007fffe807e05f:  rex pop %rbp
   0x00007fffe807e061:  retq   
End of assembler dump.
(gdb) info registers rax 
rax            0x6e696d243032752a  7956010218921555242
(gdb) p/x *(long*)0x7fffe807c020
$1 = 0x6e696d243032752a

Oct 13 '20 19:10 bjorn3

Looks like you're getting there!

Oct 13 '20 22:10 lachlansneff

I was accidentally reading the got entry in get_got_entry instead of returning the address of the got entry itself. mini_core_hello_world.rs now works with GOT+PLT.

Edit: std_example.rs also works. :tada:

Oct 14 '20 09:10 bjorn3

Discussion on the Bevy discord about how to handle changing types in case of hot code swapping: https://discord.com/channels/691052431525675048/692572690833473578/828930167648813086

Apr 06 '21 10:04 bjorn3

For future reference: some more discussion at https://discord.com/channels/691052431525675048/730525730601041940/866626266173276170

Jul 19 '21 12:07 bjorn3

Hi @bjorn3, I would love to know the current state of in place binary updating. From the issue I extract that some ground work has been laid and it is working for basic examples. Rust code-generation in jit would be great during debug scenarios.

Mar 01 '24 09:03 mav3ri3k

There is no support for in place binary patching. The default system linker is used by cg_clif. Supporting in place binary patching requires a linker with specific support for this, which the system linker doesn't have. As for jitting, there is support for a jit mode (disabled in the rustup distributed version of cg_clif), but it is generally slower than aot compilation as it is entirely incompatible with incr comp. There is a branch for runtime patching in jit mode, but I haven't touched it in years and it leaks a lot of memory on every update, quickly leading to a crash.

Mar 01 '24 12:03 bjorn3

Is using Zig as a linker an option? They explicitly have binary patching style linking as a feature. There's a cargo plugin for Zig and some positive experience reports: https://users.rust-lang.org/t/costs-of-using-zig-linker/88525

Andrew Kelley had a live coding session on Vimeo where he hot-swapped parts of the program via ptrace API on Linux. All done in milliseconds. Sadly, it seems the video got deleted since and I can't find it anymore. There's an open GitHub issue about this functionality so it's not finished.

It would be fantastic if Rust feedback cycles could be shortened thanks to Cranelift.

Mar 01 '24 21:03 adaszko

My understanding is that zig cc uses regular lld. It merely handles things relevant for cross-compiling, but is otherwise just a regular linker. For hot code swapping in zig I can't find anything about it every going beyond the prototype phase.

Mar 01 '24 23:03 bjorn3

rustc_codegen_cranelift rustc_codegen_cranelift copied to clipboard

In-place binary/jit updating

rustc_codegen_cranelift
rustc_codegen_cranelift copied to clipboard