Giang Le
Giang Le
@ayerofieiev-tt Sorry for the late reply, it was weekend Since it should be an one-line change (plus maybe some touch-ups like removing `std::move` from references), it shouldn't be too complicated....
Please note: This PR is sharing some code (particularly warp-level reduction, tensor view, etc.) with some other code in this group of Moreh’s upstream requests. We’ll consolidate them as they...
Also, is there any method we can use to view the runner's output for easier debugging? I guess we can just ask you for the failing test, but it might...
~~Hmmm... is the Static build the same as setting `-DMIOPEN_EMBED_BUILD=On`? Because if I set that on my local build it seems that even `develop` failed to build.~~ nvm it's something...
Windows build is not passing, but that is to be expected (please check #2970, previous conversations seems to suggest it was the cause)
> Also could you remove GPU specific parts from CPU implementation (more details and in this comment [#3143 (comment)](https://github.com/ROCm/MIOpen/pull/3143#discussion_r1711335230)) And may I ask you to align the test to the...
> Such huge (really huge?) error means that the kernel doesn't perform reduction in acceptable way. It’s not technically speaking, unacceptable, it’s just a side effect of when doing parallel...
I'm also working on integrating @long10024070 's MIOpenReduceSum into the reduction part and remove that duplicated code. Although due to some reorganization, please do expect some delays on that
git tree got unreadable last merge attempt, I think I will just squash + rebase everything. Makes it easier for final reviews