rustc_codegen_cranelift
rustc_codegen_cranelift copied to clipboard
In-place binary/jit updating
Loris Cro and Andrew Kelley recently published a blog post (https://kristoff.it/blog/zig-new-relationship-llvm/) about Ziglang's new self-hosted compiler. Part of the post is about how they can do in-place patching of the binary to avoid doing linking when changing small amounts of code (or maybe larger amounts too, I'm not sure). This seems extremely powerful. Linking is consistently a significant bottleneck in compiling rust, and completely skipping it would probably result in crazy speedups for compiling.
I'm proposing that this project pursue this either in a similar way to Zig, by patching individual functions in the output binary, or by keeping the jit open and recompiling individual functions as the source changes.
Thanks for pointing me to that post! Binary patching is probably hard to implement. Doing this for the JIT mode of cg_clif is much more possible. cranelift-simplejit currently doesn't support hot code swapping. I already wanted to implement this for lazy compilation in JIT mode though. Rustc doesn't have a DepNode
that I can use to implement incremental compilation of single mono items, but that shouldn't be too hard to implement. https://github.com/rust-lang/rust/pull/76474 will make it possible to create a custom rustc driver that would listen for file changes and pass the necessary persistent state to the various runs of the codegen backend.
@bjorn3 That's fantastic, I'm glad to hear that the prerequisite components for this to work are mostly already on the roadmap.
As for the binary reloading, I don't think it'd actually be too bad, since the way that Zig does this is by including a function lookup table in the binary that can be modified very easily to redirect functions to new versions. It would require logic for each output format (mach, elf, pe, wasm, etc) though.
I am more immediately excited for a jit mode, since continuous in-memory compilation would be faster than touching disk anyhow.
I have been trying to revive an old branch for lazy compilation in jit mode. Currently the following program sometimes panics. (about one in five tries I think)
#![feature(
no_core, start, lang_items, box_syntax, never_type, linkage,
extern_types, thread_local
)]
#![no_core]
#![allow(dead_code, non_camel_case_types)]
extern crate mini_core;
use mini_core::*;
use mini_core::libc::*;
unsafe extern "C" fn my_puts(s: *const i8) {
puts(s);
}
#[lang = "termination"]
trait Termination {
fn report(self) -> i32;
}
impl Termination for () {
fn report(self) -> i32 {
unsafe {
0
}
}
}
#[lang = "start"]
fn start<T: Termination + 'static>(
main: fn() -> T,
argc: isize,
argv: *const *const u8,
) -> isize {
main().report();
0
}
macro_rules! assert_eq {
($l:expr, $r: expr) => {
if $l != $r {
panic(stringify!($l != $r));
}
}
}
struct Unique<T: ?Sized> {
pointer: *const T,
_marker: PhantomData<T>,
}
impl<T: ?Sized, U: ?Sized> CoerceUnsized<Unique<U>> for Unique<T> where T: Unsize<U> {}
fn take_unique(_u: Unique<()>) {}
fn main() {
take_unique(Unique {
pointer: 0 as *const (),
_marker: PhantomData,
});
extern {
#[linkage = "extern_weak"]
static ABC: *const u8;
}
{
extern {
#[linkage = "extern_weak"]
static ABC: *const u8;
}
}
unsafe { assert_eq!(ABC as usize, 0); }
}
Found the problem. I wasn't correctly saving definitions of weak linkage statics.
Currently lazy jit compilation is significantly slower. Probably because of the fact that once a function is jitted, previous references of it still go through the compilation shim.
Benchmark #1: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu
Time (mean ± σ): 1.476 s ± 0.059 s [User: 1.412 s, System: 0.062 s]
Range (min … max): 1.435 s … 1.632 s 10 runs
Benchmark #2: /home/bjorn/Documenten/cg_clif3/target/release/cg_clif -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu && ./target/out/std_example arg
Time (mean ± σ): 650.3 ms ± 14.7 ms [User: 553.7 ms, System: 96.4 ms]
Range (min … max): 639.6 ms … 687.3 ms 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
'/home/bjorn/Documenten/cg_clif3/target/release/cg_clif -L crate=target/out --out-dir target/out -Cdebuginfo=2 example/std_example.rs --crate-type bin --target x86_64-unknown-linux-gnu && ./target/out/std_example arg' ran
2.27 ± 0.10 times faster than '/home/bjorn/Documenten/cg_clif3/target/release/cg_clif -L crate=target/out --out-dir target/out -Cdebuginfo=2 --jit example/std_example.rs --target x86_64-unknown-linux-gnu'
This should be fixable using a GOT, which would also help with other kinds of function replacement, like the in-place jit updating proposed by this issue.
Implemented a GOT+PLT for SimpleJIT. It mostly works, but crashes for mutable static writes. (haven't tried reads)
#![feature(
no_core, start, lang_items, box_syntax, never_type, linkage,
extern_types, thread_local
)]
#![no_core]
extern crate mini_core;
#[lang = "start"]
fn start(main: fn(), argc: isize, argv: *const *const u8) -> isize {
unsafe {
NUM = 43;
}
0
}
static mut NUM: u8 = 6 * 7;
fn main() {}
(gdb) disassemble 0x7fffe807e040,+0x22
Dump of assembler code from 0x7fffe807e040 to 0x7fffe807e062:
0x00007fffe807e040: rex push %rbp
0x00007fffe807e042: mov %rsp,%rbp
0x00007fffe807e045: mov -0x202c(%rip),%rax # 0x7fffe807c020
0x00007fffe807e04c: rex mov $0x2b,%ecx
0x00007fffe807e052: movzbl %cl,%ecx
=> 0x00007fffe807e056: mov %cl,(%rax)
0x00007fffe807e059: rex mov $0x0,%eax
0x00007fffe807e05f: rex pop %rbp
0x00007fffe807e061: retq
End of assembler dump.
(gdb) info registers rax
rax 0x6e696d243032752a 7956010218921555242
(gdb) p/x *(long*)0x7fffe807c020
$1 = 0x6e696d243032752a
Looks like you're getting there!
I was accidentally reading the got entry in get_got_entry
instead of returning the address of the got entry itself. mini_core_hello_world.rs
now works with GOT+PLT.
Edit: std_example.rs
also works. :tada:
Discussion on the Bevy discord about how to handle changing types in case of hot code swapping: https://discord.com/channels/691052431525675048/692572690833473578/828930167648813086
For future reference: some more discussion at https://discord.com/channels/691052431525675048/730525730601041940/866626266173276170
Hi @bjorn3, I would love to know the current state of in place binary updating. From the issue I extract that some ground work has been laid and it is working for basic examples. Rust code-generation in jit would be great during debug scenarios.
There is no support for in place binary patching. The default system linker is used by cg_clif. Supporting in place binary patching requires a linker with specific support for this, which the system linker doesn't have. As for jitting, there is support for a jit mode (disabled in the rustup distributed version of cg_clif), but it is generally slower than aot compilation as it is entirely incompatible with incr comp. There is a branch for runtime patching in jit mode, but I haven't touched it in years and it leaks a lot of memory on every update, quickly leading to a crash.
Is using Zig as a linker an option? They explicitly have binary patching style linking as a feature. There's a cargo plugin for Zig and some positive experience reports: https://users.rust-lang.org/t/costs-of-using-zig-linker/88525
Andrew Kelley had a live coding session on Vimeo where he hot-swapped parts of the program via ptrace API on Linux. All done in milliseconds. Sadly, it seems the video got deleted since and I can't find it anymore. There's an open GitHub issue about this functionality so it's not finished.
It would be fantastic if Rust feedback cycles could be shortened thanks to Cranelift.
My understanding is that zig cc
uses regular lld. It merely handles things relevant for cross-compiling, but is otherwise just a regular linker. For hot code swapping in zig I can't find anything about it every going beyond the prototype phase.