Added a utility class for Relative Error compute.
Description
Added a utility class for Relative Error compute. Added relative error compute for seismic example.
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
- [ ] bug fix - change that fixes an issue
- [x ] new feature - change that adds functionality
- [ ] tests - change in tests
- [ ] infrastructure - change in infrastructure and CI
- [ ] documentation - documentation update
Tests
- [ ] added - required for new features and some bug fixes
- [ x] not needed
Documentation
- [ ] updated in # - add PR number
- [ ] needs to be updated
- [x ] not needed
Breaks backward compatibility
- [ ] Yes
- [x ] No
- [ ] Unknown
Notify the following users
List users with @ to send notifications
Other information
I believe we need to continue measurements until one of the following conditions are met:
- Relative error is within the acceptable limit
- Overall limit on running time is reached.
Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.
The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.
I believe we need to continue measurements until one of the following conditions are met:
- Relative error is within the acceptable limit
- Overall limit on running time is reached.
Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration.
The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.
Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:
- Set
num_iterationsto previously saved value (see step 3). - Run the benchmark and get the relative error computed.
- If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).
- If it is not within the per benchmark pre-defined limit, then:
- If overall per benchmark measurement session time is acceptable, increase the
num_iterations(for example, double it) and go to the step 2. - If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.
- If overall per benchmark measurement session time is acceptable, increase the
We also, can add step 3.1 that before saving the num_iterations will try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value of num_iterations closest to the one that gives stable relative error within defined range.
I believe we need to continue measurements until one of the following conditions are met:
- Relative error is within the acceptable limit
- Overall limit on running time is reached.
Yes, that's the intent. The relative error may also vary from platform to platform for the same iteration count. In my experiments, for certain benchmarks, I have seen Relative error more stable on client systems whereas on high core count NUMA systems it sometimes increases with iteration. The Performance CI scripts will utilize the "Relative_Err" string to adjust the iteration count to minimize the relative error in respective systems.
Do I understand correctly that CI scripts will keep re-starting the measurements each time specifying increased number of iterations to run? So that they work as if in the following steps:
Set
num_iterationsto previously saved value (see step 3).Run the benchmark and get the relative error computed.
If it is within the per benchmark pre-defined limit, save that number of iterations. The system will start from that value in the next measurement session (see step 1).
If it is not within the per benchmark pre-defined limit, then:
- If overall per benchmark measurement session time is acceptable, increase the
num_iterations(for example, double it) and go to the step 2.- If overall time is not acceptable, stop and report error saying that the measurement session is unstable, results are not reliable and that further analysis or re-consideration of either system or benchmark is required.
We also, can add step 3.1 that before saving the
num_iterationswill try decreasing it a little bit and see if still produces acceptable value of relative error. This way, we could make the system that will automatically and dynamically adjusts to the value ofnum_iterationsclosest to the one that gives stable relative error within defined range.
Yes this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark. The intent is to improve the stability of measurements from run to run so that perf report geomean results are more reliable.
this workflow needs to be implemented in the CI perf runner script to adjust the number of iterations until the relative error is reached within acceptable limits for the example benchmark.
Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.
Don't you think that this better be implemented here instead? At least, it won't require restarting the whole benchmark for each new trial of measurements, which might be relatively complex process for some of them.
We would like this feature to be available as part of performance CI and extend it to all benchmarks that are part of the geomean calculation. Integrating this feature into the performance CI enables benchmark modifications to minimal (specify iteration counts and output relative error directly). For benchmarks where source code access is unavailable, the script can leverage the application runtime to compute relative error.