MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
Initial attempt at translating the Dropout OpenCL Kernel to HIP with a GTEST, with hardcoded PRNG matrices replaced with rocrand function calls.
* Added PReLU backward operation and kernels. * Added driver test and gtest for PReLU backward operation. * New API is guarded by MIOPEN_BETA_API macro. * Compared to ROCm pytorch:...
- Added [Fold](https://pytorch.org/docs/stable/generated/torch.nn.Fold.html) and [Unfold](https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html) op. - Full benchmark result compared to ROCm Here - Average performance: | Op | Dtype | Direction | Time | |--------|-----------|-----------|--------------| | Unfold |...
* Added [NLLLoss ](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html) forward and backward operation and kernel. * Added driver test and gtest for NLLLoss. * New API is guarded by MIOPEN_BETA_API macro. Nllloss float16 op_name |...
- Add [MultiMarginLoss ](https://pytorch.org/docs/stable/generated/torch.nn.MultiMarginLoss.html) forward operation and kernel. Backward is not better compared to ROCm in general. - Given input tensor is (N,C), MIOpen is better if C is small...
This PR ports the `MSELoss` family of loss function to MIOpen: - `MSELoss` - `MSELossUnreduced` Performance measurements seems to suggest that in general we're performing better than ROCm on forward,...
Provides a C++ Graph API test for backward MHA. Does not execute the graph yet because of graph engine being still in development. A follow up PR will enable graph...
This PR focuses on converting the Batch Norm Fused Inference kernel from OpenCL to HIP. This conversion is a part of the broader initiative to translate all OpenCL kernels within...
1. Rename RunCKSolution to InitInvokerFactoryBnCKFwdInferenceNHWC to differentiate the upcoming the new API InitInvokerFactoryBnCKFwdInferenceNCHW 2. Move common code to implicitgemm_ck_util.hpp
When trying to apply an average reduction on a tensor filled with `float16` elements, we encounter overflow issues. We configure the operation to use `float32` as the compute datatype, ensuring...