rustlantis icon indicating copy to clipboard operation
rustlantis copied to clipboard

Merge DUMPER and DEBUG_DUMPER

Open PoignardAzur opened this issue 1 year ago • 9 comments

From the thesis paper:

Nonetheless, there may still be programs that only result in a difference with the fast dump_var, but the bug disappears when it is tested again with the debug dump_var. In this case, we still have a reproduction and are still able to investigate the miscompilation, only more difficult

Have you considered merging the two dumper functions? Something like this:

#[inline(never)]
fn dump_var(
    val0: impl Hash + Debug,
    val1: impl Hash + Debug,
    val2: impl Hash + Debug,
    val3: impl Hash + Debug,
) {
  if some_global_variable == DEBUG_MODE {
    println!("fn{f}:_{var0} = {val0:?}\n_{var1} = {val1:?}\n_{var2} = {val2:?}\n_{var3} = {val3:?}");
  }
  else {
    unsafe {
      val0.hash(&mut H);
      val1.hash(&mut H);
      val2.hash(&mut H);
      val3.hash(&mut H);
    }
  }
}

The global variable would be set in main at runtime. Since the programs are guaranteed to be deterministic, you're guaranteed to get the same bugs for both branches. Since dump_var is already marked as #[inline(never)], the compiler would never optimize the checks away. The cost would be an additional always-predicted branch, which doesn't sound too bad.

PoignardAzur avatar Nov 28 '23 16:11 PoignardAzur

(Loved the paper, btw. Differential fuzzing of compilers is something I have a vested interested in, so this was super valuable to me.)

PoignardAzur avatar Nov 28 '23 16:11 PoignardAzur

Thanks for the suggestion. Considering I'm already marking both versions of dump_var with #[inline(never)], theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)

use std::ffi::{c_char, c_int}

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

fn dump_var(...) {
    printf("...", var...);
}

cbeuw avatar Nov 28 '23 21:11 cbeuw

(but this doesn't work in Miri)

One option here would be something like

fn print_i32(x: i32) {
  extern "C" {
      fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
  }

  if cfg!(miri) {
    println!("{x}");
  } else {
    unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
  }
}

Playground

RalfJung avatar Nov 29 '23 13:11 RalfJung

Or probably this is better to avoid relying on dead code elimination:

#[cfg(not(miri))]
fn print_i32(x: i32) {
  extern "C" {
      fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
  }

  unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
}

#[cfg(miri)]
fn print_i32(x: i32) {
  println!("{x}");
}

RalfJung avatar Nov 29 '23 13:11 RalfJung

But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

You could also pass it a &dyn HashDebug (and create the matching trait, etc) that would be initialized in the main. At the point I really don't think ~~MIRI~~ LLVM can possibly inline anything.

Also, using a dyn trait would probably improve your build times, I think?

PoignardAzur avatar Nov 29 '23 17:11 PoignardAzur

Miri doesn't do optimizations, those are only relevant for the LLVM backend.

RalfJung avatar Nov 29 '23 17:11 RalfJung

Thanks for the suggestion. Considering I'm already marking both versions of dump_var with #[inline(never)], theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)

use std::ffi::{c_char, c_int}

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

fn dump_var(...) {
    printf("...", var...);
}

If the option to not use the standard library is added, I could use rustlantis to fuzz my compiler backend. It targets .NET and is currently still very much WIP, and the standard Rust formatting does not work yet (due to codegen bugs). Currently, the biggest roadblock in development is detecting all the bugs, which rustlantis could help speed up significantly.

If you accept contributions, I could implement this printf - based formatting myself. The biggest question is - what to do with ADTs? They could either be never displayed or could implement a printf-based-formatting trait.

BTW, congrats on an amazing project and paper.

FractalFir avatar Jan 30 '24 22:01 FractalFir

I think you can print anything including arbitrary ADTs if you can produce an implementation of fmt::Write which only requires that you be able to write bytes.

use core::fmt;

const STDOUT_FILENO: i32 = 1;

struct Stdout;

impl fmt::Write for Stdout {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        unsafe extern "C" {
            fn write(fd: i32, buf: *const u8, count: usize) -> isize;
        }

        let res = unsafe { write(STDOUT_FILENO, s.as_bytes().as_ptr(), s.len()) };
        match res {
            -1 => Err(fmt::Error),
            _ => Ok(()),
        }
    }
}

fn dump_var(v: impl fmt::Debug) {
    fmt::write(&mut Stdout, format_args!("{:?}\n", v)).unwrap();
}

On Windows you'd call NtWriteFile which takes a whole mess of arguments, but the gist is the same.

saethlin avatar May 19 '25 14:05 saethlin

I implemented this (while yes completely ignoring windows) in my fork: https://github.com/cbeuw/rustlantis/commit/5b7120578bce71e6aca3482f51522300f2cb7166

saethlin avatar May 19 '25 16:05 saethlin