Timing routines
For benchmarking purposes it would be useful to have a set of timing routines.
Some examples of prior art include:
- StopWatch by William F. Mitchell from NIST: https://math.nist.gov/StopWatch/ (mirror on Github - https://github.com/ivan-pi/StopWatch)
- Stopwatch by @juanmanzanero: https://github.com/juanmanzanero/Stopwatch
- Stopwatch class by @leonfoks (part of coretran): https://github.com/leonfoks/coretran/tree/master/src/time
In Julia they use two options:
- the
@timemacro, which measures the time taken to execute an expression - the BenchmarkTools.jl package including the
@btimemacro which executes the expression multiple times and uses regression to reduce noise.
A week or two ago, I tried to build some timing macros using fypp:
#:def NTIC(n=1000)
#:global BENCHMARK_NREPS
#:set BENCHMARK_NREPS = n
block
use, intrinsic :: iso_fortran_env, only: int64, dp => real64
integer(int64) :: benchmark_tic, benchmark_toc, benchmark_count_rate
integer(int64) :: benchmark_i
real(dp) :: benchmark_elapsed
call system_clock(benchmark_tic,benchmark_count_rate)
do benchmark_i = 1, ${BENCHMARK_NREPS}$
#:enddef
#:def NTOC(*args)
#:global BENCHMARK_NREPS
end do
call system_clock(benchmark_toc)
benchmark_elapsed = real(benchmark_toc - benchmark_tic)/real(benchmark_count_rate)
benchmark_elapsed = benchmark_elapsed/${BENCHMARK_NREPS}$
#:if len(args) > 0
${args[0]}$ = benchmark_elapsed
#:else
write(*,*) "Average time is ",benchmark_elapsed," seconds."
#:endif
end block
#:del BENCHMARK_NREPS
#:enddef
These can be used then as follows:
real :: x(1000), y(1000), avg_time
call random_number(x)
@:NTIC(100)
y = sqrt(x)
@:NTOC() ! print average time
@:NTIC(100)
y = sqrt(x)
@:NTOC(avg_time) ! save average time to variable
Perhaps a combination of a StopWatch class and some fypp macros, could enable us to do some similar regression tests as done by Julia.
Hi @ivan-pi
You should use a timing method of the highest achievable precision and that means either the RDTSCP instruction or the fixed performance event: CPU_CLK_UNHALTED.THREAD. You may try to use RDTSC instruction, but you should check if its RDTSC invariant version is supported by the specific CPU. THis will require a check of MSR register. The usage of aforementioned instructions will require a call to C wrapper. P.s. I would advice against usage of complex ADT timers, because of emission of additional machine code instructions before and after the measured code. Simplicity is the most important here.
Below is an example (not Fortran) of timing methodology (loc: 317) https://github.com/bgin/Guided-Missile-Modeling-Simulation/blob/master/ReleaseGuidedMissileSimPerfTests/GMS_perf_test_ComputeGrassParamEq_zmm16r4_call_scope_looped100x10000_non_instr_autocor.cpp
Regards
Bernard Gingold