MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

AMD's Machine Intelligence Library

Results 298 MIOpen issues
Sort by recently updated
recently updated
newest added

Initial attempt at translating the Dropout OpenCL Kernel to HIP with a GTEST, with hardcoded PRNG matrices replaced with rocrand function calls.

* Added PReLU backward operation and kernels. * Added driver test and gtest for PReLU backward operation. * New API is guarded by MIOPEN_BETA_API macro. * Compared to ROCm pytorch:...

enhancement
external_collaborator

- Added [Fold](https://pytorch.org/docs/stable/generated/torch.nn.Fold.html) and [Unfold](https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html) op. - Full benchmark result compared to ROCm Here - Average performance: | Op | Dtype | Direction | Time | |--------|-----------|-----------|--------------| | Unfold |...

enhancement
TESTING_CI_PASSED
external_collaborator

* Added [NLLLoss ](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html) forward and backward operation and kernel. * Added driver test and gtest for NLLLoss. * New API is guarded by MIOPEN_BETA_API macro. Nllloss float16 op_name |...

enhancement
external_collaborator

- Add [MultiMarginLoss ](https://pytorch.org/docs/stable/generated/torch.nn.MultiMarginLoss.html) forward operation and kernel. Backward is not better compared to ROCm in general. - Given input tensor is (N,C), MIOpen is better if C is small...

enhancement
external_collaborator

This PR ports the `MSELoss` family of loss function to MIOpen: - `MSELoss` - `MSELossUnreduced` Performance measurements seems to suggest that in general we're performing better than ROCm on forward,...

enhancement
external_collaborator

Provides a C++ Graph API test for backward MHA. Does not execute the graph yet because of graph engine being still in development. A follow up PR will enable graph...

This PR focuses on converting the Batch Norm Fused Inference kernel from OpenCL to HIP. This conversion is a part of the broader initiative to translate all OpenCL kernels within...

1. Rename RunCKSolution to InitInvokerFactoryBnCKFwdInferenceNHWC to differentiate the upcoming the new API InitInvokerFactoryBnCKFwdInferenceNCHW 2. Move common code to implicitgemm_ck_util.hpp

When trying to apply an average reduction on a tensor filled with `float16` elements, we encounter overflow issues. We configure the operation to use `float32` as the compute datatype, ensuring...