GPUArrays.jl
GPUArrays.jl copied to clipboard
Reusable array functionality for Julia's various GPU backends.
Related to https://github.com/JuliaLang/julia/issues/54546 and https://github.com/JuliaLang/julia/pull/54587. cc @dkarrasch
This is part of ongoing work to make Enzyme + CUDA.jl work nicely from a user point of view. cc @vchuravy The implementation of `map!` (https://github.com/JuliaGPU/GPUArrays.jl/blob/ec9fe5b6f7522902e444c95a0c9248a4bc55d602/src/host/broadcast.jl#L120C46-L120C59) creates a broadcasted object...
Tested on main. Seems to be a Metal-specific issue as the test passes with JLArrays. ``` using Metal, GPUArrays, Random, Test begin AT = MtlArray a = AT(zeros(Float32, 1000,1000)) b...
For example, `AbstractGPUVecOrMat`: ```julia julia> LinearAlgebra.Adjoint{Float64, Matrix{Float64}} LinearAlgebra.Adjoint{Float64, CuMatrix{Float64}}
On main branch. Noticed while working on JuliaGPU/Metal.jl#321. Will still be relevant once JuliaGPU/Metal.jl#321 is merged as the MPSMatrixRandom generation is not always used. Unsigned integers and 32/64 bit variants...
The switch to KA.jl significantly slowed down several operations. --- CUDA.jl: `permudetims`, `broadcast`, and many others https://speed.juliagpu.org/changes/?tre=10&rev=6221589f5befec8f6f157a5a5271667dba09d0b6&exe=11&env=1 --- Metal.jl: `permudetims` ``` private array/permutedims/4d 2911500 ns 860084 ns 3.39 private array/permutedims/2d...
Ported from oneAPI.jl - [ ] Currentl limited to a static workgroupsize
As @maleadt mentioned in https://github.com/JuliaGPU/Metal.jl/issues/422. I re-open a new issue here. The current `LinearAlgebra.kron` only supports for `CuArray`, and the other `GPUArray` uses scalar indexing. Also, the methods for Kronecker...
Introduce `GPUNumber` to store the resul of `mapreduce` across all `dims` (i.e. `dims = :`) instead of immediately transferring it to host. `GPUNumber` copies its value to host when it...
https://github.com/JuliaGPU/AMDGPU.jl/pull/669