benchmark
benchmark copied to clipboard
Fortran and Rust Language Support
This issue proposes addition of support for Rust and Fortran. I briefly discussed this with @dominichamon on IRC. I have an implementation, but need organizational approval to open source. Posting here first to get buy in before taking on the open source process.
Motivations
My organization is historically a Fortran shop. As we evaluated new options, we wanted to be able to write representative benchmarks in C++, Fortran, and Rust to compare code and performance. I chose to use Google Benchmark for all three, as the benchmark execution is reasonably configurable, and the output is standardized. This makes it easy to run comparisons between the different languages using Google Benchmark's built-in tools. While Rust has competitors to Google Benchmark (cargo-bench, Criterion, etc.), Fortran doesn't have any obviously compelling benchmarking libraries in the style of Google Benchmark.
Mechanism
The implementation of this is done using a C ABI on top of the core Google Benchmark library. Currently everything has to go through the C boundary, including calls to KeepRunning
, which introduces some small overhead. iso_c_binding
is used to bind the ABI in Fortran portably, and Rust's FFI support is used to bind the ABI in Rust portably.
Rust Bindings
Current implementation only supports imperative registration of benchmarks, rather than declarative ala BENCHMARK
.
The user must manually implement fn main
in the current implementation:
use benchmark::{benchmarks, benchmarks_generic};
fn main() {
benchmark::initialize();
// register benchmarks
benchmark::run_benchmarks();
}
A benchmark is declared much like in C++:
mod my_mod {
pub fn foo(mut state: benchmark::State) {
while state.keep_running() {
// code to benchmark goes here
}
}
}
Generic benchmarks are also supported:
mod vector_add {
pub fn index<T: Float>(mut state: benchmark::State) {
let vec_size = state.range(0) as usize;
let a = vec![T::zero(); vec_size];
let b = vec![T::zero(); vec_size];
let mut c = vec![T::zero(); vec_size];
while state.keep_running() {
for x in 0..vec_size {
c[x] = a[x] + b[x];
}
}
benchmark::do_not_optimize(&c);
}
}
We provide two macros for registering benchmarks: benchmarks!
and benchmarks_generic!
. They each allow registering multiple benchmarks in a single macro invocation. They allow you to register a benchmark without having to restate the benchmark name, like BENCHMARK
.
For non-generic benchmarks, registering is simple:
// in `fn main`
benchmarks!(my_mod::foo, my_mod::bar);
For generic benchmarks, you must specify all the types you want to specialize the benchmarks for:
// in `fn main`
benchmarks_generic!(
f32, f64;
vector_add::index,
vector_add::index_slice,
vector_add::index_unsafe,
vector_add::zip,
vector_add::zip_collect,
);
Rendered as such:
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
vector_add::index<f32> 70705 ns 70547 ns 9356
vector_add::index<f64> 75650 ns 75468 ns 9728
vector_add::index_slice<f32> 54973 ns 54827 ns 11872
vector_add::index_slice<f64> 58562 ns 58495 ns 11259
vector_add::index_unsafe<f32> 56875 ns 56718 ns 12077
vector_add::index_unsafe<f64> 69196 ns 68970 ns 10034
vector_add::zip<f32> 21533 ns 21479 ns 32005
vector_add::zip<f64> 45947 ns 45666 ns 16122
vector_add::zip_collect<f32> 22993 ns 22943 ns 29634
vector_add::zip_collect<f64> 44486 ns 44397 ns 15397
Setting options on benchmarks is fluent similar to C++:
benchmarks_generic!(
f32, f64;
csr::mat_vec_tridiag_rayon,
csr::mat_vec_tridiag_rayon_chunked,
)
.range_multiplier(10)
.range(100_000, 100_000_000)
.use_real_time();
Fortran Bindings
I'm guessing there will be less interest in this, but may be useful to some.
I was having issues with constructing the command line arguments to pass to Google Benchmark directly from Fortran, so today the entry point is implemented in C++ and Fortran is expected to implement an entry point called RegisterBenchmarksMain
:
module my_benchmarks
use benchmark
implicit none
contains
subroutine RegisterBenchmarksMain() bind(C, name="RegisterBenchmarksMain")
type(benchmark_t), pointer :: bench
bench => benchmark_register("vector_add_idiomatic_omp", vector_add_idiomatic_omp)
call bench%range_multiplier(10)
call bench%range(100, 100000000)
call bench%use_real_time
end subroutine RegisterBenchmarksMain
end module my_benchmarks
We only provide a single benchmark_register
routine. Haven't though about how to make this more natural yet. Currently don't love the use of pointer
here - I was new to Fortran at the time I wrote this, would probably change it.
Rendered:
2019-03-05 14:16:59
Running ./ftn-bench/ftn-bench
Run on (8 X 3100 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 262K (x4)
L3 Unified 8388K (x1)
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------
vector_add_idiomatic_omp/100/real_time 32977 ns 12489 ns 20413
vector_add_idiomatic_omp/1000/real_time 35360 ns 14234 ns 20329
vector_add_idiomatic_omp/10000/real_time 35635 ns 11754 ns 19610
vector_add_idiomatic_omp/100000/real_time 131073 ns 22574 ns 5454
vector_add_idiomatic_omp/1000000/real_time 1406534 ns 75481 ns 489
vector_add_idiomatic_omp/10000000/real_time 14943647 ns 704200 ns 45
vector_add_idiomatic_omp/100000000/real_time 837987651 ns 47000 ns 1
Overhead
There is currently some small overhead, as mentioned before, in the core loop due to the call to KeepRunning
. I think this is fixable, at least for Rust, which supports cross-module inlining even without LTO. Here's what the overhead is on my machine (MacBook Pro 2017).
C++
Test
void baseline_keep_running(benchmark::State &state) {
while (state.KeepRunning()) {
}
}
BENCHMARK(baseline_keep_running);
void baseline_for(benchmark::State &state) {
for (auto _ : state) {
}
}
BENCHMARK(baseline_for);
Results
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
baseline_keep_running 0 ns 0 ns 1000000000
baseline_for 0 ns 0 ns 1000000000
Rust
Test
pub fn keep_running(mut state: State) {
while state.keep_running() {}
}
Results
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
baseline::keep_running 2 ns 2 ns 303938170
Fortran
Test
subroutine baseline_keep_running(state)
type(benchmark_state_t), intent(inout) :: state
do while (state%keep_running()); end do
end subroutine baseline_keep_running
Results
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
baseline_keep_running 3 ns 3 ns 229682906