schung-amd
schung-amd
Closing this for now as I can't reproduce it. If you're still experiencing this issue and have checked your IOMMU settings, feel free to comment and we can reopen this.
Hi @mcordery, I can't reproduce this locally; hipcc --version takes under 0.1s to run in ROCm 6.2 as well as ROCm 6.1. Does anything else run slowly for you with...
Hi @Googulator, CK currently only supports FA for MI-series cards; for example, https://github.com/ROCm/flash-attention has forward and backward attention with a CK backend for MI200 and MI300, but not on RDNA3....
Hi @vitduck, I reached out to our internal team and have a couple of insights that should help clarify this for you. The measurements you are seeing in this screenshot...
Followed up with the internal team, and yes, although it may seem illogical, the redundant copy in unified memory will not be optimized out because the compiler does not know...
I'll check with the internal team to see if they would want to add such a feature, but `hipcc` is just a thin wrapper around clang, so adding warnings here...
Thanks for the example! You don't need to verify this on baremetal, I was just personally curious if your workloads were performing enough redundant copies to run into this issue...
Hi, thanks for reporting this. I'm not sure why `build_aqlprofile.sh` explicitly requires Ubuntu, as the internal build script does not have this check. As ROCm can be installed with other...
Interesting, I'll take a look. As for `build_aqlprofile.sh`, there are several factors complicating this, and the internal team is discussing how to handle it. As you've noted, this component uses...
Closing this for now. AQLprofile is now open source, but also building from source is now supported by TheRock (https://github.com/ROCm/TheRock) rather than these build scripts.