CUDA.jl
CUDA.jl copied to clipboard
Base.stack is underperforming.
Describe the bug Stacking arrays of CuArrays is slow.
To reproduce
The Minimal Working Example (MWE) for this bug:
using BenchmarkTools, CUDA;
N=100;
M=1000;
x=randn(N);
x_cu=cu(x);
@btime stack(fill($x,M));
@btime stack(fill($x_cu,M));
@btime cu(stack(fill(collect($x_cu),M)));
As timing I am getting:
70.800 μs (3 allocations: 789.23 KiB)
15.774 ms (8 allocations: 8.19 KiB)
318.900 μs (12 allocations: 399.83 KiB)
Manifest.toml
CUDA v5.1.2
CUDA v5.1.2
Version info
Details on Julia: 1.10
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Details on CUDA:
CUDA runtime 12.3, artifact installation
CUDA driver 12.0
Unknown NVIDIA driver
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: missing
Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device:
0: NVIDIA GeForce MX150 (sm_61, 1.491 GiB / 2.000 GiB available)