MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
Part 2. Base branch for this PR is GTests_Refactoring_Integration_Branch
* Added cumulative reduction forward operation and kernel with solver, support binary operators (max, min, sum, prod). This operation equivalent to cummax, cummin, cumsum, cumprod in Pytorch. * Added driver...
- Enable NCHW/NHWC layout from driver command for batch norm. - Moved `GpumemTensor` to `driver/driver.hpp` - Stopped using old and slow `miopenBNFwdTrainPerActivationRunHost`, `miopenBNFwdTrainSpatialRunHost`, `miopenBNFwdInferPerActivationRunHost`, `miopenBNFwdInferSpatialRunHost`, `miopenBNBwdPerActivationRunHost` and `miopenBNBwdSpatialRunHost` since they...
Update reduceDims for switching NCHW and NHWC layouts with CK
This makes find or similar invokations to not construct any db instance when no tunable solver is applicable or in similar cases. Basically it removes redundant prefetches. After #3078 this...
Use reduceDims for switching NCHW and NHWC layouts with CK
The source tree contains an almost complete 2 years old copy of the Composable Kernel (CK) repository. Despite the fact that we have CK in the dependencies, these sources are...
Hello, I'm trying to build MIOpen 6.2.0 with params bellow: ``` export CC=/opt/rocm/llvm/bin/clang export CXX=/opt/rocm/llvm/bin/clang++ CXX=$ROCM_INSTALL_DIR/llvm/bin/clang++ cmake \ -Wno-dev \ -G Ninja \ -D CMAKE_CXX_FLAGS="${CXXFLAGS} -fcf-protection=none -DNDEBUG" \ -D CMAKE_INSTALL_PREFIX=/opt/rocm...
Hello, I've seen the progress ROCM made and I see that there is even potential for it to be useful on Windows. This can be a real game changer. I...