Anton Smirnov
Anton Smirnov
To avoid changing method signatures we can just change [this](https://github.com/JuliaGPU/CUDA.jl/blob/229d13f88fece1bd1dd6422575d61edf1e0cb753/lib/cusparse/conversions.jl#L35) line to: ```diff - m=maximum(I), n=maximum(J); + m=maximum(I)[], n=maximum(J)[]; ``` > IIUC
> How much additional pressure does this put on the GC? For the Flux model that I have and use for testing, machine consistently hangs (machine with a single AMD...
> I take it this works because getproperty is forwarded to the inner value? No, `GPUNumber` only inherits `Number` interface. For everything else (like with that `reducer` example) the user...
So I've made it behave as usual when `eltype` is not `Number`, otherwise return `GPUNumber`. I'll also do some more testing and benchmarking to see the impact.
Here's also a timeline for the Flux.jl model for CUDA.jl. Profiling over 20 training steps and explicitly avoiding any host transfers, like visualizing loss values. Before it took ~29 seconds,...
Remaining gaps could be either GC pauses. Running profiling with GC logging enabled: ```julia GC: pause 366.60ms. collected 5.253326MB. incr GC: pause 112.91ms. collected 34.399342MB. full recollect GC: pause 355.50ms....
> You could use NVTX.jl to visualize GC pauses ||Timeline| |-|-| |default gc threads|| |`--gcthreads=4`|| So it does look like these gaps are GC pauses. I've also logged `maybe_collect` and...
> Is the GC time spent marking/sweeping in Julia, or are the cuMemFreeAsync calls soaking up the time? Selected green region is where `cuMemFreeAsync` happens, so I guess the bigger...
It was [HiFi-GAN](https://arxiv.org/abs/2010.05646) and https://github.com/JuliaDiff/ChainRules.jl/pull/801 probably brings bigger performance gain. And I'm not sure at this point how impactful this PR is in the real-world use-cases, because in the end...
Ah... I accidentally removed my fork of `GPUArrays` and forgot it has this PR... We should reopen it