CUDA.jl
CUDA.jl copied to clipboard
Add Statistics functions
As mentioned by user ogiod on discourse here we still lack some statistical functions.
We still need to implement
- [x]
cor
- [x]
cov
- [ ]
median
: Scalar operation when dims is mentioned - [ ]
middle
: Implement extrema function - [ ]
quantile
: Probably need sort for this
Base's Statistics can be found here : https://github.com/JuliaLang/Statistics.jl/blob/master/src/Statistics.jl
CUDA.jl's Statistics can be found here : https://github.com/JuliaGPU/CUDA.jl/blob/master/src/statistics.jl
Hi, can I work on this ?
No need to ask for permission. Note that somebody's already working on some of these functions, see https://github.com/JuliaGPU/CUDA.jl/pull/509.
I have experience with Julia, but I am not sure how to work on this specific problem, a relative newcomer to opensource
The functions here are not GPU compatible, they either perform scalar iteration or don't compile. We need to implement GPU-compatible alternatives, as compactly as possible (i.e. reimplementing the least amount of code from Base). That's often done by relying on existing array operations. Have a look at the linked PR for examples.
In the interest of time and not gobbling up all the work (I have open-source experience but am a Julia newcomer), I'm only going to attempt covariance and correlation.
I've encountered how quantile()
doesn't work and mistakenly thought this was a Statistics.jl problem before finding this (I'm new to all this, sorry, ha). Not sure if I should close the issue I opened over there (how?) and re-post it in CUDA or what.
Regardless, I'm interested in fixing/working on this, but the CUDA statistics link above is a 404 now, so I'm not quite sure where to begin. I think sort()
works just fine on my CuArray and my problem was the missing/nan checks (scalar indexing) in Statistics.jl (skipmissing()
doesn't work for the CuArray as suggested in the quantile() documentation, but I am not sure that's the correct fix?).
Any help is appreciated, thanks.
Those functions have been moved to GPUArrays: https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/host/statistics.jl
skipmissing
is not going to work easily because it doesn't have a length
property, or random-access indexing, which are generally required for GPU programming (where we don't iterate, but index directly with the thread index).
Right, so the NaN
is an easy fix right, just mapreduce(isnan,|,v)
will do. Could do the same with missing
, but this seems more like a stylistic question as to what quantile()
should do/expect from a GPUArray.