MIOpen [conv] miopenFindConvolution: Officially allow smaller workspace than GWSS returns.

This essentially re-implements "Consider workspace constraints when loading solutions from DB (#2888)" but in a bit stricter manner and with (hopefully) almost all necessary bells and whistles.

Functional differences from #2888:

🟢 [Correctness] Documentation updated.
🟠 [Performance] Resolved: Providing smaller workspace one time may result in permanent performance regression.
- Why: Sub-optimal data may be cached in the user's find-db. As a result, in Hybrid Find modes (DYNAMIC_HYBRID is the default), performance will remain sub-optiman even if sufficient workspace is allocated. To get rid of this problem, the user will/ need to either remove the user's find-db, or repeat miopenFindConvolution*() with affected convolution configs in Normal Find Mode (the latter will overwrite sub-optimal user's find-db records).
  - [Notice] That's why we didn't want to implement this feature until it is explicitly requested by users.
- Fix: Update find-db only if workspace size is sufficient for all applicable solvers.
  - This behavior can be overridden by MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE.
🟡 [Performance] Resolved: The host-side and GPU performance may be affected if requestAlgoCount is small.
- What happens: The results of loading from find-db depends on the requestAlgoCount value passed to Find. If the first requestAlgoCount entries do not meet the workspace size limits, then the library executes the Dynamic Find process, which spends extra time on the host and usually ends up with slow Naive kernels.
- Fix: This implementation checks all entries in the find-db record.
- The similar problem with results of fallback AI/WTI path is also resolved.
🟡 [Robustness] Added nullptr checks.
⚪ [Debugging] Added more detailed logging at warning level.

By-products

⚪ [Debugging] Improve custom logging: print logging level.
⚪ [Debugging][Driver] MIOPEN_DRIVER_CONV_WORKSPACE_SIZE_ADJUST: support adjustment of workspace size for convolutions, for testing purposes.

:cyclone: Routine performance/correctness testing

Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
Versions of MIOpen under test: develop f10b809b7 (just before #2888) vs. this PR a610dd16c.
Tested configs:
- 93 known FP32 convolution configs
- 93 known FP16 convolution configs
Modes:
- (1) Default Find mode (DYNAMIC_HYBRID) with default system find-db
- (2) Immediate mode (with user find-db possibly updated at step (1))
🟢 No performance or correctness regressions.

:cyclone: Feature testing

Correctness/performance

Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
Versions of MIOpen under test: baseline develop 2232184b4 vs. this PR a610dd16c.
Tested configs:
- 93 known FP32 convolution configs
- 93 known FP16 convolution configs
Modes:
- Default Find Mode (DYNAMIC_HYBRID), 1/2 workspace, default find-db
- Default Find Mode (DYNAMIC_HYBRID), Zero workspace, default find-db
🟢 No correctness regressions. Average performance gain:

mode	Fwd GPU time	Bwd GPU time	WrW GPU time	Overall wall time	Overall AUX wall time
FP16 ws=1/2	1	1	1.38	1.09	1.25
FP16 ws=0	1	1	1.37	1.05	1.91
FP32 ws=1/2	1	1	1	1	1
FP32 ws=0	1	1	1	1	1

MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE

Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
Versions of MIOpen under test: this PR a610dd16c.
Normal Find mode
93 well-known FP32 convolution configs
User-find-db removed before each run
Tested configs
- (1) Normal workspace
- (2) Zero workspace, MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE is not set
- (3) Zero workspace, MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE=1
🟢 Manually verified that generated user find-db contents are as expected.

[Attribution] @junliume @JehandadKhan

https://github.com/ROCm/MIOpen/labels/enhancement
https://github.com/ROCm/MIOpen/labels/urgency_normal
https://github.com/ROCm/MIOpen/labels/IMPACTS_API
Proposed reviewers:
- @tbennun
- @averinevg
- @JehandadKhan
- @CAHEK7
- @DrizztDoUrden

May 02 '24 22:05 atamazov

@tbennun @averinevg @JehandadKhan @CAHEK7 @DrizztDoUrden @junliume This PR is now ready for reviewing & CI testing (someone please run CI job).

May 07 '24 22:05 atamazov

Looks good to me!

May 07 '24 23:05 tbennun

@DrizztDoUrden

In EvaluateInvokers is_result_optimal = false; should probably be also added to the catch as this indicates a failure of unknown reason and unexpected behaviour in general, which may lead to performance degradation.

Good observation! But let's elaborate a bit:

When a solver is broken (and throws), it probably shouldn't be used at all. This is unlikely a situation where something is temporarily broken due to lack of resources (workspace for example), but later we can use the same solver in a different way and everything will work.

A typical quick W/A for this situation is to disable such a solver for certain configurations (at least until the problem is properly resolved). A side effect is that the solver is skipped, which does not affect the quality of the benchmarking. And that's exactly what the try-catch block currently does.

May 10 '24 17:05 atamazov

@msaudulhassan @JehandadKhan @CAHEK7 @junliume https://github.com/ROCm/MIOpen/labels/TESTING_CI_PASSED

May 10 '24 18:05 atamazov