rust-ascii icon indicating copy to clipboard operation
rust-ascii copied to clipboard

Benchmark

Open omid opened this issue 3 years ago • 2 comments

Dear all,

Is there any evidence if this crate is faster than Rust std String (for ASCII)? Did anybody do benchmark or something?

Generally I think there should be a section in README about why should somebody use this crate, for ASCII strings.

omid avatar Oct 05 '22 08:10 omid

Anybody any experience to this?

kolbma avatar Jan 31 '24 17:01 kolbma

I've tried to do some bench.
The internal storage is in both a Vec<u8>, so there is very little overhead to check the ASCII limit in addition to Utf8-validation.
The memory footprint in theory, ascii could optimize and store 2 chars in u8. At least if you limit to US-ASCII like it is done up to now. But it stores 1 char/byte. In reality there might be only smaller footprint if you use the AsciiChar-enum in many places.
Next to this it depends on what you want to do with the AsciiString/String.
Under the hood the allocation and optimization with e.g. AVX-instructions looks different.
So it really depends on what you are doing with the data.

In my benches it looks like this:

test ascii_string_bench ... bench:  23,229,803 ns/iter (+/- 312,943)
test std_string_bench   ... bench:  21,427,752 ns/iter (+/- 117,644)

Left side ascii, right side std...

flamegraph

And if there are more method-calls, instead each time an instantiation, ascii seems to become little bit faster. I think this has to do with compiler optimizations based on the AsciiChar-enum.
But just a guess.

test ascii_string_bench ... bench:  81,402,392 ns/iter (+/- 1,438,370)
test std_string_bench   ... bench:  82,892,839 ns/iter (+/- 656,453)

flamegraph

The lorem_ipsum.txt is a content of https://loremipsum.io/generator/?n=20&t=p and used in this code:

#![feature(test)]
#![allow(unused_crate_dependencies)]

use std::{convert::TryFrom, fs, io::BufRead};

use ascii::{AsAsciiStr, AsciiStr, AsciiString};

extern crate test;

struct HeaderFieldValue<T>(T);

impl TryFrom<&[u8]> for HeaderFieldValue<AsciiString> {
    type Error = ();

    fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
        Ok(Self(AsciiString::from_ascii(value).unwrap()))
    }
}

impl TryFrom<&[u8]> for HeaderFieldValue<String> {
    type Error = ();

    fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
        Ok(Self(String::from_utf8(value.to_vec()).unwrap()))
    }
}

fn load_lorem_ipsum() -> Vec<Vec<u8>> {
    let mut data = Vec::new();
    let file = fs::read("./benches/lorem_ipsum.txt").unwrap();
    for l in file.lines() {
        let l = l.unwrap();
        if !l.is_empty() {
            data.push(l.as_bytes().to_vec());
        }
    }
    assert_eq!(data.len(), 20);
    assert!(data[0].len() > 10);

    data
}

#[allow(clippy::inline_always)]
#[inline(always)]
fn create_header_field_value_ascii(data: &Vec<u8>) -> HeaderFieldValue<AsciiString> {
    HeaderFieldValue::try_from(data.as_slice()).unwrap()
}

#[allow(clippy::inline_always)]
#[inline(always)]
fn create_header_field_value_std(data: &Vec<u8>) -> HeaderFieldValue<String> {
    HeaderFieldValue::try_from(data.as_slice()).unwrap()
}

#[allow(clippy::inline_always)]
#[inline(always)]
fn task_ascii(data: &Vec<Vec<u8>>, rounds: usize) -> Vec<HeaderFieldValue<AsciiString>> {
    let mut fields = Vec::new();
    for d in data {
        let hfv = create_header_field_value_ascii(d);
        fields.push(hfv);
    }

    for _ in 0..rounds {
        for field in &fields {
            let s = field.0.as_ascii_str().unwrap();
            let mut s_clone = s.to_ascii_string();
            let splits = s_clone.split(ascii::AsciiChar::Space);
            let mut count = 0;
            for split in splits {
                count += split.len();
            }
            let _ = std::hint::black_box(count);

            s_clone.push_str(AsciiStr::from_ascii(b"Hello World").unwrap());
            s_clone.insert_str(0, AsciiStr::from_ascii(b"Start the race").unwrap());
            s_clone.shrink_to_fit();
            let xvalue = s_clone.remove(3);
            let _ = std::hint::black_box(xvalue);

            let f = format!("s: {s_clone}");
            let _ = std::hint::black_box(f);
        }
    }

    fields
}

#[allow(clippy::inline_always)]
#[inline(always)]
fn task_std(data: &Vec<Vec<u8>>, rounds: usize) -> Vec<HeaderFieldValue<String>> {
    let mut fields = Vec::new();
    for d in data {
        let hfv = create_header_field_value_std(d);
        fields.push(hfv);
    }

    for _ in 0..rounds {
        for field in &fields {
            let s = field.0.as_str();
            let mut s_clone = s.to_string();
            let splits = s_clone.split(' ');
            let mut count = 0;
            for split in splits {
                count += split.len();
            }
            let _ = std::hint::black_box(count);

            s_clone.push_str(std::str::from_utf8(b"Hello World").unwrap());
            s_clone.insert_str(0, std::str::from_utf8(b"Start the race").unwrap());
            s_clone.shrink_to_fit();
            let xvalue = s_clone.remove(3);
            let _ = std::hint::black_box(xvalue);

            let f = format!("s: {s_clone}");
            let _ = std::hint::black_box(f);
        }
    }

    fields
}

const ROUNDS: usize = 10;
const INNER_ROUNDS: usize = 500;

#[bench]
fn ascii_string_bench(bencher: &mut test::Bencher) {
    let data = load_lorem_ipsum();

    bencher.iter(|| {
        for _ in 0..ROUNDS {
            let fields = task_ascii(&data, INNER_ROUNDS);
            let _fields = std::hint::black_box(fields);
        }
    });
}

#[bench]
fn std_string_bench(bencher: &mut test::Bencher) {
    let data = load_lorem_ipsum();

    bencher.iter(|| {
        for _ in 0..ROUNDS {
            let fields = task_std(&data, INNER_ROUNDS);
            let _fields = std::hint::black_box(fields);
        }
    });
}

kolbma avatar Feb 01 '24 16:02 kolbma