MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
Purpose of review: - Integrating CK FA V2 FWD inference solution into MIOpen. Note: - Adding tests as a follow up PR. - This is being merged into an integration...
WIP for #2860
Part 1. Base branch for this PR is GTests_Refactoring_Integration_Branch
- Added [SoftMarginLoss](https://pytorch.org/docs/stable/generated/torch.nn.SoftMarginLoss.html) operation for both forward and backward. Compared to ROCm, it is better for all cases. - New API is guarded by MIOPEN_BETA_API macro. Added 2 kernels: SoftMarginLossForward5d,...
Replaced two kernels with one because they are similar. Merged two dropout functions that are calling kernels into one with parameter is_backward to determine which kernel should be executed.
Update Batchnorm forward inference to support NCHW format CK Reference code: https://github.com/ROCm/composable_kernel/blob/a9b170b54195ab667ca814f80dd5dfbf4ad772f5/test/batchnorm/batchnorm_infer_rank_4.cpp#L91 https://github.com/ROCm/composable_kernel/blob/a9b170b54195ab667ca814f80dd5dfbf4ad772f5/profiler/include/profiler/profile_batchnorm_infer_impl.hpp#L29
This is print values
- [x] Added Kthvalue operation with forward kernels. - [x] Added driver test and gtest. - [x] Compared with ROCm. ### Compare to ROCm The kernel is only 20% faster...