oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Added a utility class for Relative Error compute.

Open sarathnandu opened this issue 1 year ago • 2 comments

Description

Added a utility class for Relative Error compute. Added relative error compute for seismic example.

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • [ ] bug fix - change that fixes an issue
  • [x ] new feature - change that adds functionality
  • [ ] tests - change in tests
  • [ ] infrastructure - change in infrastructure and CI
  • [ ] documentation - documentation update

Tests

  • [ ] added - required for new features and some bug fixes
  • [ x] not needed

Documentation

  • [ ] updated in # - add PR number
  • [ ] needs to be updated
  • [x ] not needed

Breaks backward compatibility

  • [ ] Yes
  • [x ] No
  • [ ] Unknown

Notify the following users

List users with @ to send notifications

Other information

sarathnandu avatar Jul 12 '24 16:07 sarathnandu

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.

The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

sarathnandu avatar Oct 14 '24 14:10 sarathnandu

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.

The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:

  1. Set num_iterations to previously saved value (see step 3).
  2. Run the benchmark and get the relative error computed.
  3. If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).
  4. If it is not within the per benchmark pre-defined limit, then:
    • If overall per benchmark measurement session time is acceptable, increase the num_iterations (for example, double it) and go to the step 2.
    • If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.

We also, can add step 3.1 that before saving the num_iterations will try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value of num_iterations closest to the one that gives stable relative error within defined range.

aleksei-fedotov avatar Oct 14 '24 20:10 aleksei-fedotov

I believe we need to continue measurements until one of the following conditions are met:

  1. Relative error is within the acceptable limit
  2. Overall limit on running time is reached.

Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration. The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.

Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:

  1. Set num_iterations to previously saved value (see step 3).

  2. Run the benchmark and get the relative error computed.

  3. If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).

  4. If it is not within the per benchmark pre-defined limit, then:

    • If overall per benchmark measurement session time is acceptable, increase the num_iterations (for example, double it) and go to the step 2.
    • If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.

We also, can add step 3.1 that before saving the num_iterations will try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value of num_iterations closest to the one that gives stable relative error within defined range.

Yes this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark. The intent is to improve the stability of measurements from run to run so that perf report geomean results are more reliable.

sarathnandu avatar Oct 28 '24 02:10 sarathnandu

this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark.

Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.

aleksei-fedotov avatar Oct 28 '24 13:10 aleksei-fedotov

Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.

We would like this feature to be available as part of performance CI and extend it to all benchmarks that are part of the geomean calculation. Integrating this feature into the performance CI enables benchmark modifications to minimal (specify iteration counts and output relative error directly). For benchmarks where source code access is unavailable, the script can leverage the application runtime to compute relative error.

sarathnandu avatar Oct 28 '24 20:10 sarathnandu