Alex Eremin
Alex Eremin
> > We do have https://github.com/ROCm/MIOpen/blob/develop/test/gpu_reference_kernel.cpp Yes, that's a naive CPU single threaded ultra slow verification for naive GPU algorithm. That test is not about "huge" tensors, it has exactly...
> Yes, we do need to do the slow cpu run. I can the test a nightly run. > I'm not sure that we do need. It depends on the...
Tensor operations are layout agnostic, the most important limitation is all the tensors must have the same layout. Another limitation from the current implementation - all the kernels, even the...
This PR decided to be closed, because there is another approach to fix that problem.
Is it https://github.com/ROCm/MIOpen/labels/external_collaborator / https://github.com/ROCm/MIOpen/labels/enhancement ? Could you also add an appropriate description?
It can be safely merged since it does not affect production code. I just don't want to lose this PR.
That PR is also important, since it provides centralized definition for a set of activation functions (for example for #3247 where it has got a local definition for sigmoid https://github.com/ROCm/MIOpen/pull/3247/files#diff-2a117e014b2a1c04feb3ede9723a78a8d11d656b4e9631fa57ab1d7c58df55d6)...
> @CAHEK7 @amberhassaan > > Please find the profiling results attached below. > @sgundabo Just for a reference - what king of gpu did you use to get those results?...
@junliume could you merge this one, as there are few other PRs depend on it.
This algorithm is very similar to https://github.com/ROCm/MIOpen/pull/3143 could you explain why do you use different indexing scheme? @BuiChiTrung could you help too? Also could you remove GPU specific parts from...