schung-amd

Results 157 comments of schung-amd

Hi @aaronenyeshi, as you've noted in https://github.com/pytorch/kineto/pull/926, this is due to roctracer enumerating the CPU as well as the GPU devices. This is by design; roctracer is pulling the node...

Hi all, I was able to reproduce this issue. Following the instructions to build `migraphx` with cmake at https://github.com/ROCm/AMDMIGraphX, I saw the same error while running the command `CXX=/opt/rocm/llvm/bin/clang++ cmake...

I've reached out to the `MIGraphX` team, and we do currently rely on the specific commit of `composable_kernel` being pulled in, as it has features that were not added into...

Hi @jinz2014, `hipMemcpyAsync` (and `cudaMemcpyAsync` on the CUDA end) are asynchronous with compute operations but not necessarily memory copy operations; only one copy can be executing at a time per...

Hi @tpadioleau, sorry for the delayed response. This fix isn't in a release yet as far as I can tell, but I can keep tabs on this and update you...

Never mind, I was passing the wrong options to tar, the file is fine. Can confirm that this is not fixed as of ROCm 6.2.2, I'll update you when the...

Sorry you're blocked by this issue. I can't make any promises, but we're looking into getting this fix into the next major release.

@erayinanc @tpadioleau The fix should be in ROCm 6.3 from what I can tell, thanks for your patience! I'll leave this open for now for confirmation upon the release of...

Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel...

Hi @shoshijak @doru1004, thanks for identifying this issue. HIP currently supports unrolling loops with bounds that are defined at compile-time; see https://rocm.docs.amd.com/projects/HIP/en/docs-6.0.0/reference/kernel_language.html#pragma-unroll. In this case, mn is defined at run-time,...