Tolmachev Dmitrii
Tolmachev Dmitrii
Dear Clement, > I assume it has something to do with the .disableReorderFourStep optimization that is enabled for bigger sizes? Yes, the issue was that zero-padding code was inserted at...
> While I'm expecting 1/4 of the 2D window to have meaningful data, I get ~25% extra data vertically :) This is intended. The initial first axis zero-padding works only...
@Clemasteredge can you post the shaders code printed during execution with keepShaderCode enabled for both images? It will speed up finding out the issue. Thanks!
@Clemasteredge I have identified and fixed the issue, thanks! As for shader printing - if enabled it will print all the kernels encountered during execution, which is why it may...
Hello, Currently, I have not implemented convolutions codegen for multiple-upload sequences - and this is what exactly happens after 8192 on AMD GPUs. In general, the algorithm is not different...
Hello, I have checked the results of Vulkan (2 uploads) and CUDA (1 upload as driver exposes more shared memory) backends for the 4096 system and they match. So I...
Ah, I see the issue now. VkFFT stores kernel in the non-original data layout format in the multiple uploads case. This is done because if you usually do the FFT...
> I need to modify some of the configuration (perhaps dimensions/size or coordinateFeatures/matrixConvolution values) No, there is no changes that need to be done to configuration. You initialize the VkFFT...
Do you access it after initializeVkFFT call? Because it will be initialized only after the call.
Dear @jgeary18, I am truly sorry, I have previously mentioned axisSplit wrong - it is actually a 2D array, with the first dimension related to x,y,z axis and the second...