[conv] miopenFindConvolution: Officially allow smaller workspace than GWSS returns.
This essentially re-implements "Consider workspace constraints when loading solutions from DB (#2888)" but in a bit stricter manner and with (hopefully) almost all necessary bells and whistles.
Functional differences from #2888:
- 🟢 [Correctness] Documentation updated.
- 🟠 [Performance] Resolved: Providing smaller workspace one time may result in permanent performance regression.
- Why: Sub-optimal data may be cached in the user's find-db. As a result, in Hybrid Find modes (DYNAMIC_HYBRID is the default), performance will remain sub-optiman even if sufficient workspace is allocated. To get rid of this problem, the user will/ need to either remove the user's find-db, or repeat miopenFindConvolution*() with affected convolution configs in Normal Find Mode (the latter will overwrite sub-optimal user's find-db records).
- [Notice] That's why we didn't want to implement this feature until it is explicitly requested by users.
- Fix: Update find-db only if workspace size is sufficient for all applicable solvers.
- This behavior can be overridden by
MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE.
- This behavior can be overridden by
- Why: Sub-optimal data may be cached in the user's find-db. As a result, in Hybrid Find modes (DYNAMIC_HYBRID is the default), performance will remain sub-optiman even if sufficient workspace is allocated. To get rid of this problem, the user will/ need to either remove the user's find-db, or repeat miopenFindConvolution*() with affected convolution configs in Normal Find Mode (the latter will overwrite sub-optimal user's find-db records).
- 🟡 [Performance] Resolved: The host-side and GPU performance may be affected if
requestAlgoCountis small.- What happens: The results of loading from find-db depends on the
requestAlgoCountvalue passed to Find. If the firstrequestAlgoCountentries do not meet the workspace size limits, then the library executes the Dynamic Find process, which spends extra time on the host and usually ends up with slow Naive kernels. - Fix: This implementation checks all entries in the find-db record.
- The similar problem with results of fallback AI/WTI path is also resolved.
- What happens: The results of loading from find-db depends on the
- 🟡 [Robustness] Added nullptr checks.
- ⚪ [Debugging] Added more detailed logging at warning level.
By-products
- ⚪ [Debugging] Improve custom logging: print logging level.
- ⚪ [Debugging][Driver]
MIOPEN_DRIVER_CONV_WORKSPACE_SIZE_ADJUST: support adjustment of workspace size for convolutions, for testing purposes.
:cyclone: Routine performance/correctness testing
- Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
- Versions of MIOpen under test:
developf10b809b7 (just before #2888) vs. this PR a610dd16c. - Tested configs:
- 93 known FP32 convolution configs
- 93 known FP16 convolution configs
- Modes:
- (1) Default Find mode (DYNAMIC_HYBRID) with default system find-db
- (2) Immediate mode (with user find-db possibly updated at step (1))
- 🟢 No performance or correctness regressions.
:cyclone: Feature testing
Correctness/performance
- Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
- Versions of MIOpen under test: baseline
develop2232184b4 vs. this PR a610dd16c. - Tested configs:
- 93 known FP32 convolution configs
- 93 known FP16 convolution configs
- Modes:
- Default Find Mode (DYNAMIC_HYBRID), 1/2 workspace, default find-db
- Default Find Mode (DYNAMIC_HYBRID), Zero workspace, default find-db
- 🟢 No correctness regressions. Average performance gain:
| mode | Fwd GPU time | Bwd GPU time | WrW GPU time | Overall wall time | Overall AUX wall time |
|---|---|---|---|---|---|
| FP16 ws=1/2 | 1 | 1 | 1.38 | 1.09 | 1.25 |
| FP16 ws=0 | 1 | 1 | 1.37 | 1.05 | 1.91 |
| FP32 ws=1/2 | 1 | 1 | 1 | 1 | 1 |
| FP32 ws=0 | 1 | 1 | 1 | 1 | 1 |
MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE
- Preconditions: Ubuntu 20.04.5 LTS, BUILD_DEV=On, Release build, Navi21/36, ROCm 6.0.2 docker
- Versions of MIOpen under test: this PR a610dd16c.
- Normal Find mode
- 93 well-known FP32 convolution configs
- User-find-db removed before each run
- Tested configs
- (1) Normal workspace
- (2) Zero workspace, MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE is not set
- (3) Zero workspace, MIOPEN_FIND_CONV_INSUFFICIENT_WORKSPACE_ALLOW_FINDDB_UPDATE=1
- 🟢 Manually verified that generated user find-db contents are as expected.
[Attribution] @junliume @JehandadKhan
- https://github.com/ROCm/MIOpen/labels/enhancement
- https://github.com/ROCm/MIOpen/labels/urgency_normal
- https://github.com/ROCm/MIOpen/labels/IMPACTS_API
- Proposed reviewers:
- @tbennun
- @averinevg
- @JehandadKhan
- @CAHEK7
- @DrizztDoUrden
@tbennun @averinevg @JehandadKhan @CAHEK7 @DrizztDoUrden @junliume This PR is now ready for reviewing & CI testing (someone please run CI job).
Looks good to me!
@DrizztDoUrden
In EvaluateInvokers
is_result_optimal = false;should probably be also added to thecatchas this indicates a failure of unknown reason and unexpected behaviour in general, which may lead to performance degradation.
Good observation! But let's elaborate a bit:
When a solver is broken (and throws), it probably shouldn't be used at all. This is unlikely a situation where something is temporarily broken due to lack of resources (workspace for example), but later we can use the same solver in a different way and everything will work.
A typical quick W/A for this situation is to disable such a solver for certain configurations (at least until the problem is properly resolved). A side effect is that the solver is skipped, which does not affect the quality of the benchmarking. And that's exactly what the try-catch block currently does.
@msaudulhassan @JehandadKhan @CAHEK7 @junliume https://github.com/ROCm/MIOpen/labels/TESTING_CI_PASSED