Valentin Churavy comments

Results 1413 comments of


                                            Valentin Churavy

Make stdlib upgradeability opt-out

I am a bit uncertain about HistoricalStdlibVersions.jl It currently has: ``` UUID("0dad84c5-d112-42e6-8d28-ef12dabb789f") => ("ArgTools", v"1.1.1"), UUID("4af54fe1-eca0-43a8-85a7-787d91b784e3") => ("LazyArtifacts", nothing), ``` Both are registered. The version we have for ArgTools on...

[MPICH] Build for device ch4

I am skewed towards providing a more performant default, so `ch4` get's my vote

[MPICH] Build for device ch4

#10249 switched to `ch4`

simple batched dot kernel is ~1.7x slower with Const on Titan RTX

Can you post a profile https://cuda.juliagpu.org/stable/development/profiling/#Integrated-profiler so that we can determine if the overhead is in the kernel or the kernel launch.

simple batched dot kernel is ~1.7x slower with Const on Titan RTX

If you changed the problem size then you need to change the number of blocks. ``` julia> CUDA.@profile batched_dot_cuda!(o, x, y; threads=32, blocks=round(Int, length(o)/32)) ```

simple batched dot kernel is ~1.7x slower with Const on Titan RTX

Ok that is still surprising to me. I expect some overhead but nothing that should scale like that.

simple batched dot kernel is ~1.7x slower with Const on Titan RTX

What is `CUDA.versioninfo()` Running this locally on a `Quadro RTX 4000`: ``` Device-side activity: GPU was busy for 1.98 ms (10.55% of the trace) ┌──────────┬────────────┬───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ Time (%) │ Total...