MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
* Added [TripletMarginLoss](https://pytorch.org/docs/stable/generated/torch.nn.TripletMarginLoss.html) foward and backward operations and kernels. * Added driver test and gtest for TripletMarginLoss foward and backward operations. * New API is guarded by MIOPEN_BETA_API macro. *...
`CPU_kernel_inliner_NONE.InlinerTest` is a test for `addkernels` utility which is invoked during `MIOpen` build process. The utility is not to be delivered to end users and not present in `miopen-hip-clients` package....
- Add `Median` operation with forward and backward kernel. - This op basically reuses `kthvalue` operation. - Add driver and gtest for kernel. - Performance condition: MIOpen performs better if...
- Add `PadConstant` operation [[ref]](https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html) with forward and backward kernels. - Add driver and gtest for kernels. - Performance condition: - MIOpen is faster if inputs and outputs are all...
- Add `AnyForward`, `AllForward` op, follow previous implemented `ReduceCalculation` ops (e.g. `Sum`, `Prod`), with some alter, since `Sum` and `Prod` are numerical calculation and `Any` and `All` are logical calculation....
- Add `GenerateRandomBitMask` operation. - Add driver and gtest for kernel. ### Average improvement over ROCm - This operation hasn't implemented in ROCm pytorch, so there's no need to benchmark.
- Add `Pdist` operation with backward kernels. - Add driver and gtest for kernels. ### Average improvement over ROCm | type | bwd | |----------|------| | float | 1.65 |...
- Add RoIAlign operation [[ref]](https://pytorch.org/vision/main/generated/torchvision.ops.roi_align.html#torchvision.ops.roi_align) for both forward and backward. - Add driver and gtest. - Performance condition: - For `RoIAlignForward`, MIOpen performs better if input and output tensors are...
- Added LogCumSumExp [[ref]](https://pytorch.org/docs/stable/generated/torch.logcumsumexp.html) forward and backward operations and kernels. - This implementation works when **value of the cumulative operated dimension is less than the highest number of threads inside...
- Add Pad Reflection 1D operation [[ref](https://pytorch.org/docs/stable/generated/torch.nn.ReflectionPad1d.html)] for forward and backward. - Add driver and gtest. - Performance condition: - Not type `bfloat16` - MIOpen performs better if it's padding1D...