CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Base.stack is underperforming.

Open rcalxrc08 opened this issue 5 months ago • 1 comments

Describe the bug Stacking arrays of CuArrays is slow.

To reproduce

The Minimal Working Example (MWE) for this bug:

using BenchmarkTools, CUDA;
N=100;
M=1000;
x=randn(N);
x_cu=cu(x);
@btime stack(fill($x,M));
@btime stack(fill($x_cu,M));
@btime cu(stack(fill(collect($x_cu),M)));

As timing I am getting:

70.800 μs (3 allocations: 789.23 KiB)
15.774 ms (8 allocations: 8.19 KiB)
318.900 μs (12 allocations: 399.83 KiB)
Manifest.toml

CUDA v5.1.2
CUDA v5.1.2

Version info

Details on Julia: 1.10

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Details on CUDA:

CUDA runtime 12.3, artifact installation
CUDA driver 12.0
Unknown NVIDIA driver

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: missing

Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce MX150 (sm_61, 1.491 GiB / 2.000 GiB available)

rcalxrc08 avatar Jan 21 '24 12:01 rcalxrc08