Matthew Nicely
Matthew Nicely
Closing due to inactivity
@kadeng did you resolve your issue?
Exactly! For good examples compare the following 1. https://github.com/rapidsai/cusignal/blob/branch-0.19/python/cusignal/windows/windows.py 2. https://github.com/scipy/scipy/blob/a376262/scipy/signal/windows/windows.py I could look something like this ```python xcorr_array_kernel = cp.ElementwiseKernel( "float64 psd_0, float64 psd_1, float64 freqs, float64 total_lag, float64...
@ziyuhuang123 is your issue resolved?
@kmaehashi I wanted to let you know that we have the new cuDNN [Frontend](https://github.com/NVIDIA/cudnn-frontend) that may make integration and fusions easier. Let me know if you have any questions
All, I really appreciate the quick responses. @emcastillo The example above was more for visual example for (2). But I have written a few elementwise kernels that require data reuse....
@emcastillo I did confirm what you saw in your example is as expected, or little-to-no performance improvement. But if the elementwise kernel were to become more complex (adding for-loops and...
Further information for those interested. ```C++ __global__ void copy(const dtype* const __restrict__ src, dtype* const __restrict__ dst, const size_t dlen) { #pragma unroll for (int i = 0; i <...
I'll try to test your branch next week! I have to teach a course today. You're probably right. I guess I've gotten lazy with easiness of the Elementwise Kernel :joy:
@emcastillo I'm not seeing huge improvements in my testing. 10% is some cases. Question, how do Elementwise kernels handle inputs when they are also the output? Scenario: 1. Input is...