bibek

Results 10 issues of bibek

In CK for (conv + bias + activation) we currently have activation as RELU. It would be great to have other activations. We can start with the ones that take...

enhancement

The ones we need for Transformer Engine are the following: 1) CUBLASLT_EPILOGUE_GELU_AUX step 1 : matrix multiplication step 2 : apply gelu step 3 : store the result to seperate...

enhancement

Currently the [conv + bias + activ kernel is only for type f16](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu/device_conv2d_fwd_xdl_c_shuffle_bias_relu_nhwc_kyxc_nhwk_f16_instance.cpp). It would be great if we had similar kernel for F32 and other data types.

enhancement

Integrate CK's 3D bilinear solvers. This PR is combination of https://github.com/ROCm/MIOpen/pull/2817 and https://github.com/ROCm/MIOpen/pull/2912

value_high
urgency_high
TESTING_CI_PASSED

* [use struct to return data ](https://github.com/ROCmSoftwarePlatform/MIOpen/pull/2154/files#r1211864942) * [see if passing of unique_ptr is really necesary](https://github.com/ROCmSoftwarePlatform/MIOpen/pull/2154/files#r1199790458) * [remove item added to target_link_library that was part of workaround](https://github.com/ROCmSoftwarePlatform/MIOpen/pull/2177) This issue is...

* This PR finds the performance hipMemcpy, hipMemset and SetTensor The performance results for tensor {4, 4, 1115, 1115} in micro sec were : hipMemcpy = 29717 hipMemset = 1...

enhancement
performance
debugging

CK's fp8 col major solver integration into MIOpen

TESTING_CI_PASSED

Can we do tuning using API? Lets say if I want to tune fused solver or regular solver. I can do: MIOPEN_FIND_ENFORCE=3 MIOPEN_FIND_MODE=1 ./bin/MIOpenDriver CBAInfer -F 4 -n 128 -c...

documentation

* This PR is to generate the mha static lib from generate.py

- Enable NCHW/NHWC layout from driver command for batch norm. - Moved `GpumemTensor` to `driver/driver.hpp` - Stopped using old and slow `miopenBNFwdTrainPerActivationRunHost`, `miopenBNFwdTrainSpatialRunHost`, `miopenBNFwdInferPerActivationRunHost`, `miopenBNFwdInferSpatialRunHost`, `miopenBNBwdPerActivationRunHost` and `miopenBNBwdSpatialRunHost` since they...