MIOpen Find Mode TrustVerify

Additional MIOPEN_FIND_MODE = 6 (TrustVerify) This mode extends DynamicHybrid

Running with TrustVerify will first attempt to load tuning results from system resources If no solution is returned tuning will be triggered If a solution is retuned the user find db will be checked for the solution If solution is from the user find db it will be used If a solution is from the system (db or model), the solution will be evaluated and the new time will be compared to the time reported by the solution If the evaluated / reported time is less than the tolerance threshold then the system result is added to the user db and the solution is returned If the evaluated / reported time exceeds the tolerance, then tuning will be triggered

This find mode will ensure that all configurations are tuned for the deployment system. Solutions from system resources are verified once and tuned if markedly different from expectation. Results from user dbs are considered reliable and used without further verification.

Mar 11 '25 21:03 cderb

The performance of find mode TrustVerify is between the performance of find mode DynamicHybrid and find enforce SearchDbUpdate. TrustVerify will have the performance of SearchDbUpdate when

There is no system db or model solution
The solution returned from system is a degree slower than reported by the solution itself based on a 1 time evaluation TrustVerify will have the performance of DynamicHybrid when
The solution returned from system resource performs as expected based on 1 time evaluation
Solution is present in user db

First run with TrustVerify is generally slower than DynamicHybrid. After the first run, the user find db will be fully populated and user perf db populated with regenerated entries. All following runs should exhibit ideal runtimes.

Mar 12 '25 19:03 cderb

Can we get a list of current issues this should fix or improve? As well as any situations that this may cause drops compared to existing results? In particular, I'm thinking of cases on distributed runs where they may be starting with a blank user db on every node.

Apr 14 '25 13:04 BradPepersAMD

This find mode option will function like MIOPEN_FIND_ENFORCE=3 in the worst case. Worst case is when either there is no entry or the system entry is much slower than advertised. In the best case the entry is either in the user db or the system entry is acceptable (at which point the system entry becomes a user entry). This would be a simple db recall. This strategy is best for long running workloads as it guarantees that configurations used are optimal. Assuming the system entries are perfect this mode will be slower on first run than the current default DYNAMIC_HYBRID. This is due to the overhead of benchmarking the kernels to verify their runtimes.

I had wanted to split the option for tuning individual solvers to MIOPEN_FIND_ENFORCE, but this env overrides MIOPEN_FIND_MODE. So presently TRUST_VERIFY also effectively enforces MIOPEN_FIND_ENFORCE=SEARCH_DB_UPDATE, which can take quite some time. Having another option that forgoes individual solver tuning may give better results for shorter running applications and would cap the runtime closer to find mode DYNAMIC_HYBRID.

Apr 14 '25 17:04 cderb

Feedback:

I think we need a check, that will not use this mode if the userDB isn't writable / accessible.
I think we might need some more data, or just pick a less restrictive range for determining to retune.
Add setting the configurable knobs through API calls instead of just env variables to make it settable for users programatically.
If / when we move forward with this we need to coordinate with pytorch, and determine how the benchmark flags should behave.
- My proposal is:
  - Default this is on, with a pretty forgiving range for determining the systemDB entry is good, and not too high of a patience setting / max tuning time setting.
  - When the benchmark flag is set in Pytorch, we should reduce the allowed range for accepting a systemDB entry, and then also increase the patience / max tuning time setting.
    - Might even be worth chatting about this, as it might make sense to have benchmark actually invalidate userDB entries as well come to think of it.

May 27 '25 21:05 BrianHarrisonAMD