benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

Timing routines

Open ivan-pi opened this issue 5 years ago • 1 comments

For benchmarking purposes it would be useful to have a set of timing routines.

Some examples of prior art include:

  • StopWatch by William F. Mitchell from NIST: https://math.nist.gov/StopWatch/ (mirror on Github - https://github.com/ivan-pi/StopWatch)
  • Stopwatch by @juanmanzanero: https://github.com/juanmanzanero/Stopwatch
  • Stopwatch class by @leonfoks (part of coretran): https://github.com/leonfoks/coretran/tree/master/src/time

In Julia they use two options:

  • the @time macro, which measures the time taken to execute an expression
  • the BenchmarkTools.jl package including the @btime macro which executes the expression multiple times and uses regression to reduce noise.

A week or two ago, I tried to build some timing macros using fypp:

#:def NTIC(n=1000)
  #:global BENCHMARK_NREPS
  #:set BENCHMARK_NREPS = n
  block
    use, intrinsic :: iso_fortran_env, only: int64, dp => real64
    integer(int64) :: benchmark_tic, benchmark_toc, benchmark_count_rate
    integer(int64) :: benchmark_i
    real(dp) :: benchmark_elapsed
    call system_clock(benchmark_tic,benchmark_count_rate)
    do benchmark_i = 1, ${BENCHMARK_NREPS}$
#:enddef

#:def NTOC(*args)
    #:global BENCHMARK_NREPS
    end do
    call system_clock(benchmark_toc)
    benchmark_elapsed = real(benchmark_toc - benchmark_tic)/real(benchmark_count_rate)
    benchmark_elapsed = benchmark_elapsed/${BENCHMARK_NREPS}$
  #:if len(args) > 0
    ${args[0]}$ = benchmark_elapsed
  #:else
    write(*,*) "Average time is ",benchmark_elapsed," seconds."
  #:endif
  end block
  #:del BENCHMARK_NREPS
#:enddef

These can be used then as follows:

  real :: x(1000), y(1000), avg_time
  call random_number(x)

  @:NTIC(100)
  y = sqrt(x)
  @:NTOC() ! print average time

  @:NTIC(100)
  y = sqrt(x)
  @:NTOC(avg_time) ! save average time to variable

Perhaps a combination of a StopWatch class and some fypp macros, could enable us to do some similar regression tests as done by Julia.

ivan-pi avatar Jul 17 '20 10:07 ivan-pi

Hi @ivan-pi

You should use a timing method of the highest achievable precision and that means either the RDTSCP instruction or the fixed performance event: CPU_CLK_UNHALTED.THREAD. You may try to use RDTSC instruction, but you should check if its RDTSC invariant version is supported by the specific CPU. THis will require a check of MSR register. The usage of aforementioned instructions will require a call to C wrapper. P.s. I would advice against usage of complex ADT timers, because of emission of additional machine code instructions before and after the measured code. Simplicity is the most important here.

Below is an example (not Fortran) of timing methodology (loc: 317) https://github.com/bgin/Guided-Missile-Modeling-Simulation/blob/master/ReleaseGuidedMissileSimPerfTests/GMS_perf_test_ComputeGrassParamEq_zmm16r4_call_scope_looped100x10000_non_instr_autocor.cpp

Regards

Bernard Gingold

bgin avatar May 24 '21 11:05 bgin