MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
* Added [MatrixDiag](https://www.tensorflow.org/api_docs/python/tf/raw_ops/MatrixDiagV3), [MatrixSetDiag](https://www.tensorflow.org/api_docs/python/tf/raw_ops/MatrixSetDiagV3), [MatrixDiagPart](https://www.tensorflow.org/api_docs/python/tf/raw_ops/MatrixDiagPartV3) forward and backward. * Added driver test and gtest for both direction. * New APIs are guarded by MIOPEN_BETA_API macro. * Compare to ROCm pytorch:...
- Added basic [LogSumExp](https://pytorch.org/docs/stable/generated/torch.logsumexp.html) operation and kernel. - Added driver test and gtest for LogSumExp. - New API is guarded by MIOPEN_BETA_API macro. When comparing the newly developed miopen LogSumExp...
- Add EmbeddingBag operation with forward kernels. - Add driver and gtest for kernels. - MIOpen performs better if: - Mode: Max - Mode: Mean or Sum, when the tensor...
1. When I use the t5layernorm command **_MIOpenDriver t5layernorm --input 128x256x512x11 -F 1 -m 0 -t 1 -i 1_**, I encountered this error,Details are as follows: > MIOpenDriver t5layernorm --input...
On RX 6850M XT [gfx1031] with HSA_OVERRIDE_GFX_VERSION=10.3.0 Gentoo, HIP version 6.3.42134, MIOpen version 3.3.0 Met with the error by running: https://github.com/HomebrewML/HeavyBall/blob/e8e44c2594230a59508d64830ed9af1732411f8f/examples/soap.py Minimal reproduction: ``` MIOPEN_FIND_ENFORCE=3 HSA_OVERRIDE_GFX_VERSION=10.3.0 HIP_VISIBLE_DEVICES=0 MIOpenDriver conv -n...
- Add MaskedFill operation with forward and backward kernels. - Add driver and gtest for kernels. - MIOpen performs better if: - Forward: tensors are not all contiguous - Backward:...
- Add Embedding operation with backward kernels. - Add driver and gtest for kernels. - MIOpen performs better if: - Split dimension is 0 - Number of elements in input...
- Add Trace operation with forward and backward kernels. - Add driver and gtest for kernels. ### Average improvement over ROCm | type | fwd | bwd | |----------|------|------| |...
- Add Gather operation with forward kernel. - Add driver and gtest for kernel. ### Average improvement over ROCm | type | bwd | |----------|------| | float16 | 1.39 |...
- Detail of [operation](https://www.tensorflow.org/api_docs/python/tf/gather_nd) (Tensorflow) - Add GatherND operation with backward kernel. - Add driver and gtest for kernels. ### Average improvement over ROCm | type | bwd | |----------|------|...