rust-ascii
rust-ascii copied to clipboard
Benchmark
Dear all,
Is there any evidence if this crate is faster than Rust std String (for ASCII)? Did anybody do benchmark or something?
Generally I think there should be a section in README about why should somebody use this crate, for ASCII strings.
Anybody any experience to this?
I've tried to do some bench.
The internal storage is in both a Vec<u8>, so there is very little overhead to check the ASCII limit in addition to Utf8-validation.
The memory footprint in theory, ascii could optimize and store 2 chars in u8. At least if you limit to US-ASCII like it is done up to now. But it stores 1 char/byte. In reality there might be only smaller footprint if you use the AsciiChar-enum in many places.
Next to this it depends on what you want to do with the AsciiString/String.
Under the hood the allocation and optimization with e.g. AVX-instructions looks different.
So it really depends on what you are doing with the data.
In my benches it looks like this:
test ascii_string_bench ... bench: 23,229,803 ns/iter (+/- 312,943)
test std_string_bench ... bench: 21,427,752 ns/iter (+/- 117,644)
Left side ascii, right side std...
And if there are more method-calls, instead each time an instantiation, ascii seems to become little bit faster. I think this has to do with compiler optimizations based on the AsciiChar-enum.
But just a guess.
test ascii_string_bench ... bench: 81,402,392 ns/iter (+/- 1,438,370)
test std_string_bench ... bench: 82,892,839 ns/iter (+/- 656,453)
The lorem_ipsum.txt is a content of https://loremipsum.io/generator/?n=20&t=p
and used in this code:
#![feature(test)]
#![allow(unused_crate_dependencies)]
use std::{convert::TryFrom, fs, io::BufRead};
use ascii::{AsAsciiStr, AsciiStr, AsciiString};
extern crate test;
struct HeaderFieldValue<T>(T);
impl TryFrom<&[u8]> for HeaderFieldValue<AsciiString> {
type Error = ();
fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
Ok(Self(AsciiString::from_ascii(value).unwrap()))
}
}
impl TryFrom<&[u8]> for HeaderFieldValue<String> {
type Error = ();
fn try_from(value: &[u8]) -> Result<Self, Self::Error> {
Ok(Self(String::from_utf8(value.to_vec()).unwrap()))
}
}
fn load_lorem_ipsum() -> Vec<Vec<u8>> {
let mut data = Vec::new();
let file = fs::read("./benches/lorem_ipsum.txt").unwrap();
for l in file.lines() {
let l = l.unwrap();
if !l.is_empty() {
data.push(l.as_bytes().to_vec());
}
}
assert_eq!(data.len(), 20);
assert!(data[0].len() > 10);
data
}
#[allow(clippy::inline_always)]
#[inline(always)]
fn create_header_field_value_ascii(data: &Vec<u8>) -> HeaderFieldValue<AsciiString> {
HeaderFieldValue::try_from(data.as_slice()).unwrap()
}
#[allow(clippy::inline_always)]
#[inline(always)]
fn create_header_field_value_std(data: &Vec<u8>) -> HeaderFieldValue<String> {
HeaderFieldValue::try_from(data.as_slice()).unwrap()
}
#[allow(clippy::inline_always)]
#[inline(always)]
fn task_ascii(data: &Vec<Vec<u8>>, rounds: usize) -> Vec<HeaderFieldValue<AsciiString>> {
let mut fields = Vec::new();
for d in data {
let hfv = create_header_field_value_ascii(d);
fields.push(hfv);
}
for _ in 0..rounds {
for field in &fields {
let s = field.0.as_ascii_str().unwrap();
let mut s_clone = s.to_ascii_string();
let splits = s_clone.split(ascii::AsciiChar::Space);
let mut count = 0;
for split in splits {
count += split.len();
}
let _ = std::hint::black_box(count);
s_clone.push_str(AsciiStr::from_ascii(b"Hello World").unwrap());
s_clone.insert_str(0, AsciiStr::from_ascii(b"Start the race").unwrap());
s_clone.shrink_to_fit();
let xvalue = s_clone.remove(3);
let _ = std::hint::black_box(xvalue);
let f = format!("s: {s_clone}");
let _ = std::hint::black_box(f);
}
}
fields
}
#[allow(clippy::inline_always)]
#[inline(always)]
fn task_std(data: &Vec<Vec<u8>>, rounds: usize) -> Vec<HeaderFieldValue<String>> {
let mut fields = Vec::new();
for d in data {
let hfv = create_header_field_value_std(d);
fields.push(hfv);
}
for _ in 0..rounds {
for field in &fields {
let s = field.0.as_str();
let mut s_clone = s.to_string();
let splits = s_clone.split(' ');
let mut count = 0;
for split in splits {
count += split.len();
}
let _ = std::hint::black_box(count);
s_clone.push_str(std::str::from_utf8(b"Hello World").unwrap());
s_clone.insert_str(0, std::str::from_utf8(b"Start the race").unwrap());
s_clone.shrink_to_fit();
let xvalue = s_clone.remove(3);
let _ = std::hint::black_box(xvalue);
let f = format!("s: {s_clone}");
let _ = std::hint::black_box(f);
}
}
fields
}
const ROUNDS: usize = 10;
const INNER_ROUNDS: usize = 500;
#[bench]
fn ascii_string_bench(bencher: &mut test::Bencher) {
let data = load_lorem_ipsum();
bencher.iter(|| {
for _ in 0..ROUNDS {
let fields = task_ascii(&data, INNER_ROUNDS);
let _fields = std::hint::black_box(fields);
}
});
}
#[bench]
fn std_string_bench(bencher: &mut test::Bencher) {
let data = load_lorem_ipsum();
bencher.iter(|| {
for _ in 0..ROUNDS {
let fields = task_std(&data, INNER_ROUNDS);
let _fields = std::hint::black_box(fields);
}
});
}