divan icon indicating copy to clipboard operation
divan copied to clipboard

Renaming long arguments

Open dfherr opened this issue 1 year ago • 6 comments

I'm sorry if this is quite obvious, but I couldn't figure it out from the docs.

I am benchmarking a function that takes Strings as input and want to run it on different strings. One of these strings is supposed to be quite long. That leads to divan printing the whole long string into the terminal effectively making the output unreadable.

How can I avoid this?

I guess I could wrap my input into the function with a custom ToString() method. But I find that quite ugly as it alters the input of my function affecting the benchmark results.

I was looking for ways to name the arguments or change the print output of divan instead of changing the benchmark function itself.

dfherr avatar Dec 01 '24 14:12 dfherr

When supplying arguments to a benchmark, you should think of them more like cases rather than like the kinds of arguments provided by property-based testing frameworks. See the args docs for examples of the kind of usage I'm describing.

My recommendation would be to use an enum or short predefined strings to define cases, although perhaps it's better to do something else in your situation. Could you give examples of the kinds of strings you're providing?

nvzqz avatar Dec 01 '24 20:12 nvzqz

i was benchmarking a really simple string reverse. one of the test cases returned from strings() was supposed to be a long string (I copied multiple paragraphs of lorem ipsum for that). The test itself runs fine, the problem is really that the long string gets added to the command line output

Here is my benchmark:

#![feature(iter_collect_into)]

fn main() {
    println!("running benchmarks");
    // Run registered benchmarks.
    divan::main();
}

fn strings() -> impl Iterator<Item = String> {
    [
        String::from("abc"),
        String::from("Two words"),
        String::from("Three words, really"),
        String::from("日本語のテキストを長いテキストにutf8"),
        String::from("我們也來添加一些中文吧"),
    ]
    .into_iter()
}

#[divan::bench(args = strings())]
fn reverse_string_naive(s: &str) -> String {
    s.chars().rev().collect::<String>()
}

#[divan::bench(args = strings())]
fn reverse_string_prefilled(s: &str) -> String {
    let mut reversed = String::with_capacity(s.len());
    s.chars().rev().collect_into(&mut reversed);
    reversed
}

#[divan::bench(args = strings())]
fn reverse_string_manual(input: &str) -> String {
    let bytes = input.len();

    let mut reversed: Vec<u8> = vec![0u8; bytes];

    let mut pos = bytes;

    for char in input.chars() {
        let char_len = char.len_utf8();
        let start = pos - char_len;
        char.encode_utf8(&mut reversed[start..pos]);
        pos = start;
    }

    unsafe { String::from_utf8_unchecked(reversed) }
}

even with the shorter strings you can see misalignment on the outputs (a long multiline string just makes it unreadable):

image

so what I would've liked to do is name the test cases. e.g. the last two one could read Japanese Chinese and then another Long Text.

dfherr avatar Dec 01 '24 20:12 dfherr

i've looked through the code and found that my request is currently not possible

https://github.com/nvzqz/divan/blob/414ace96ade21407a559d6cd4237207d4e7f855f/src/bench/args.rs#L58-L145

the runner implementation will always use String arguments as the arg name. If they aren't strings, I could use the Debug implementation to provide my own formatter (still kinda ugly since it also overrides printing my arg).

I was thinking it would be nice to be able to provide a new config parameter that allows the user to provide an args_name_formatter that takes the arg as input and returns a name as String. If this formatter is provided it gets presedence over the existing logic and how to convert args to strings is up to the user.

I tried hacking around a bit to maybe provide a pull request of the idea, but I just started learning Rust and wasn't able to get a grasp on all the meta programming to get another argument passed into the BenchArgs struct (I would've made it an Option function).

The result would work like this: #[divan::bench(args = strings(), args_name_formatter=formatter)]

with formatter: Fn(&I::Item) -> String

dfherr avatar Dec 01 '24 23:12 dfherr

Yeah Divan always formats args according to their ToString implementation. I'm open to suggestions for how to make this better. I'm not sure if an args_name_formatter option is the right choice.

Also, the misalignment is due to using str.chars().count(). Perhaps something else like unicode-width would be more appropriate.

For getting what you want today, you could create a Case type that stores its name and data:

struct Case {
    name: &'static str,
    text: &'static str,
}

impl std::fmt::Display for Case {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        f.write_str(self.name)
    }
}

const ALL_CASES: &[Case] = &[
    Case { name: "one word", text: "abc" },
    Case { name: "two words", text: "Two words" },
    Case { name: "three words", text: "Three words, really" },
    // ...
];

#[divan::bench(args = ALL_CASES)]
fn reverse_string_naive(case: &Case) -> String {
    case.text.chars().rev().collect::<String>()
}

#[divan::bench(args = ALL_CASES)]
fn reverse_string_prefilled(case: &Case) -> String {
    let s = case.text;
    let mut reversed = String::with_capacity(s.len());
    s.chars().rev().collect_into(&mut reversed);
    reversed
}

#[divan::bench(args = ALL_CASES)]
fn reverse_string_manual(case: &Case) -> String {
    let input = case.text;
    let bytes = input.len();

    let mut reversed: Vec<u8> = vec![0u8; bytes];

    let mut pos = bytes;

    for char in input.chars() {
        let char_len = char.len_utf8();
        let start = pos - char_len;
        char.encode_utf8(&mut reversed[start..pos]);
        pos = start;
    }

    unsafe { String::from_utf8_unchecked(reversed) }
}

nvzqz avatar Dec 01 '24 23:12 nvzqz

There were three options that intuitively made sense to me while keeping them entirely optional:

  • providing the names as an additional option. this should error if args.len() != args_names.len()
  • adding a special case to the args parameter that allows name+arg pairs (basically like your case struct but without being forced to handle it inside the function i'm benchmarking)
  • providing a name formatter that provides the string instead of the currently enforced transformation in the runner

the first one fits the "name" attribute for the whole test case, but I don't like the necessary assertion on the same length. The second one is kinda ugly, because it's a special case within the current control flow and makes the args parameter less pure.

i liked the formatter option. it's the least intrusive, doesn't have the length problem and it integrates seamlessly in the current api, while providing full control on how to style the table output to the user.

well, maybe the fourth option would be to not just provide access to a names formatter, but for the whole output. but i guess that's a) a lot harder to use and b) a lot more effort to implement.

dfherr avatar Dec 02 '24 00:12 dfherr

I have faced the same issue today with this code;

fn create_random_vector(size: usize) -> Vec<u32> {
    let mut rng = rand::thread_rng();
    (0..size).map(|_| rng.gen_range(0..100)).collect()
}


#[divan::bench(args = [ create_random_vector(100_000) ])]
fn sum_iter(v: &Vec<u32>) -> u64 {
    v.iter().map(|&x| x as u64).sum()
}

The entire vector is printed in the results, and this is of course a nightmare to read them XD

I used a the technique proposed above, with a wrapper struct:

struct BenchmarkData {
    data: Vec<u32>,
}

const BENCHMARK_DATA_SIZE: usize = 50_000_000;

impl BenchmarkData {
    pub fn new() -> Self {
        Self { data: create_random_vector(BENCHMARK_DATA_SIZE) }
    }
}

impl Display for BenchmarkData {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "[...] len={}", self.data.len())
    }
}

#[divan::bench(args = [ BenchmarkData::new() ])]
fn bench_sum_iter(data: &BenchmarkData) {
    sum_iter(&data.data);
}

I believe this technique is worth documenting :)

If I were to propose an evolution of divan, I would suggest 2 possibilities:

  1. #[divan::bench(args = strings(), args_name_limit=50)] => if the string if too long, it's truncated and an ellipsis is added.
  2. divan::main(args_name_limit=50) => I know this is a Python syntax, but you see the idea. Same as 1, but for all benchmarks.

Bktero avatar Dec 20 '24 17:12 Bktero