zig
zig copied to clipboard
std.mem: adapt SIMD code from indexOfScalar for other scalar functions
Continued from #23889
All benchmarks compiled in ReleaseFast on -target x86_64-linux-gnu -mcpu=haswell.
echo 1 > data.bin
head -c 1G /dev/zero >> data.bin
echo 2 >> data.bin
head -c 1G /dev/urandom > data.bin
const std = @import("std");
pub fn main() !void {
const file = try std.fs.cwd().openFile("data.bin", .{ .mode = .read_only });
const buffer = try file.readToEndAlloc(std.heap.page_allocator, 1 << 48);
for (0..20) |_| {
std.mem.doNotOptimizeAway(std.mem.indexOfScalar(u8, buffer, '2'));
// std.mem.doNotOptimizeAway(std.mem.containsAtLeastScalar(u8, buffer, 1 << 32, 0));
// std.mem.doNotOptimizeAway(std.mem.count(u8, buffer, "\x00"));
// std.mem.doNotOptimizeAway(std.mem.countScalar(u8, buffer, 0));
// std.mem.doNotOptimizeAway(std.mem.lastIndexOfScalar(u8, buffer, '1'));
// std.mem.doNotOptimizeAway(std.mem.lastIndexOf(u8, buffer, "1"));
}
}
indexOfScalar:
$ hyperfine -w 2 -r 10 ./index-old ./index-new
Benchmark 1: ./index-old
Time (mean ± σ): 1.914 s ± 0.031 s [User: 1.551 s, System: 0.339 s]
Range (min … max): 1.863 s … 1.970 s 10 runs
Benchmark 2: ./index-new
Time (mean ± σ): 1.900 s ± 0.025 s [User: 1.538 s, System: 0.338 s]
Range (min … max): 1.876 s … 1.950 s 10 runs
Summary
./index-new ran
1.01 ± 0.02 times faster than ./index-old
containsAtLeastScalar:
$ hyperfine -w 2 -r 10 ./contains-old ./contains-new
Benchmark 1: ./contains-old
Time (mean ± σ): 17.224 s ± 0.050 s [User: 16.916 s, System: 0.231 s]
Range (min … max): 17.164 s … 17.321 s 10 runs
Benchmark 2: ./contains-new
Time (mean ± σ): 1.826 s ± 0.018 s [User: 1.584 s, System: 0.227 s]
Range (min … max): 1.808 s … 1.853 s 10 runs
Summary
./contains-new ran
9.43 ± 0.10 times faster than ./contains-old
count/countScalar:
$ hyperfine -w 2 -r 10 ./count-old ./count-new
Benchmark 1: ./count-old
⠸ Performing warmup runs ██████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:03:35
^C
$ hyperfine -w 2 -r 10 ./count-new
Benchmark 1: ./count-new
Time (mean ± σ): 1.775 s ± 0.016 s [User: 1.533 s, System: 0.225 s]
Range (min … max): 1.755 s … 1.806 s 10 runs
lastIndexOf/lastIndexOfScalar:
$ hyperfine -w 2 -r 10 ./last-index-scalar-old ./last-index-old ./last-index-new
Benchmark 1: ./last-index-scalar-old
Time (mean ± σ): 8.408 s ± 0.038 s [User: 8.119 s, System: 0.231 s]
Range (min … max): 8.354 s … 8.477 s 10 runs
Benchmark 2: ./last-index-old
Time (mean ± σ): 10.867 s ± 0.036 s [User: 10.583 s, System: 0.226 s]
Range (min … max): 10.817 s … 10.918 s 10 runs
Benchmark 3: ./last-index-new
Time (mean ± σ): 1.696 s ± 0.013 s [User: 1.452 s, System: 0.227 s]
Range (min … max): 1.680 s … 1.716 s 10 runs
Summary
./last-index-new ran
4.96 ± 0.04 times faster than ./last-index-scalar-old
6.41 ± 0.05 times faster than ./last-index-old