MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

BN FWD INFER OCL to HIP

Open sgundabo opened this issue 1 year ago • 7 comments

This PR focuses on converting the Batch Norm Forward Inference Spatial and Per Activation kernels from OpenCL to HIP. This conversion is a part of the broader initiative to translate all OpenCL kernels within MIOpen, as the OpenCL backend has been deprecated.

Ensuring correctness: The PR includes a GTEST that compares the output of the OpenCL kernel with the HIP implementation. The test cases are derived from the existing batch norm forward inference kernel GTEST.

Ensuring GPU Performance parity: The GTEST also measures the minimum, maximum, mean, median, and standard deviation of the kernel execution time across five runs and records the data in a CSV file. This data is used to create graphs that illustrate the average performance improvement of the HIP implementation over OpenCL. An average performance gain greater than one is considered favorable. TODO: Collect perf metrics on a wider variety of test cases on a gfx90a, and ensure parity.

Ensuring Host side Performance parity: As the OpenCL backend support is deprecated in MIOpen, the assumption is that this decision was made while being aware of the compilation overhead of HIP kernels over OpenCL.

sgundabo avatar Jul 10 '24 15:07 sgundabo

@sgundabo PR description is missing some important info.

Please provide:

  • explanation of what is done and why
  • how this have been tested for correctness
  • the same Q about GPU time/performance
    • is it guaranteed that the new kernels have the same or better perf than the old ones? If yes, then how this is guaranteed?
  • the same Q about host-side overhead.
    • we know that HIP compilation time is typically 10x longer than OCL. We must ensure that OCL->HIP transition does not lead to degradation of "initial iteration time".

/cc @CAHEK7 @junliume

atamazov avatar Jul 17 '24 18:07 atamazov

the same Q about host-side overhead.

  • we know that HIP compilation time is typically 10x longer than OCL. We must ensure that OCL->HIP transition does not lead to degradation of "initial iteration time".

We can't do anything with it, except deprecating HIP and get back to OCL.

CAHEK7 avatar Jul 22 '24 15:07 CAHEK7

@sgundabo could you move this PR to ready for review?

junliume avatar Jul 24 '24 00:07 junliume

@junliume we are waiting for the perf metrics

CAHEK7 avatar Jul 24 '24 08:07 CAHEK7

Raw Perf Data BNFwdInferRawPerfData.zip

HW info: gfx90a

FP32 Perf BNFwdInferRawPerfData_FP32

FP16 Perf BNFwdInferRawPerfData_FP16

sgundabo avatar Jul 31 '24 11:07 sgundabo

It can be safely merged since it does not affect production code. I just don't want to lose this PR.

CAHEK7 avatar Sep 12 '24 15:09 CAHEK7

Restarted CI for sanity checks

junliume avatar Sep 27 '24 23:09 junliume

MIOpen is moving to the monorepo and old PR's need to be closed for this to happen. If needed, this PR can be re-opened in the new repo.

BradPepersAMD avatar Jul 14 '25 05:07 BradPepersAMD