CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Add Statistics functions

Open Ellipse0934 opened this issue 4 years ago • 8 comments

As mentioned by user ogiod on discourse here we still lack some statistical functions.

We still need to implement

  • [x] cor
  • [x] cov
  • [ ] median : Scalar operation when dims is mentioned
  • [ ] middle : Implement extrema function
  • [ ] quantile : Probably need sort for this

Base's Statistics can be found here : https://github.com/JuliaLang/Statistics.jl/blob/master/src/Statistics.jl

CUDA.jl's Statistics can be found here : https://github.com/JuliaGPU/CUDA.jl/blob/master/src/statistics.jl

Ellipse0934 avatar Jul 01 '20 18:07 Ellipse0934

Hi, can I work on this ?

7vikpeculiar avatar Oct 26 '20 05:10 7vikpeculiar

No need to ask for permission. Note that somebody's already working on some of these functions, see https://github.com/JuliaGPU/CUDA.jl/pull/509.

maleadt avatar Oct 26 '20 09:10 maleadt

I have experience with Julia, but I am not sure how to work on this specific problem, a relative newcomer to opensource

7vikpeculiar avatar Oct 26 '20 10:10 7vikpeculiar

The functions here are not GPU compatible, they either perform scalar iteration or don't compile. We need to implement GPU-compatible alternatives, as compactly as possible (i.e. reimplementing the least amount of code from Base). That's often done by relying on existing array operations. Have a look at the linked PR for examples.

maleadt avatar Oct 26 '20 10:10 maleadt

In the interest of time and not gobbling up all the work (I have open-source experience but am a Julia newcomer), I'm only going to attempt covariance and correlation.

berquist avatar Oct 27 '20 02:10 berquist

I've encountered how quantile() doesn't work and mistakenly thought this was a Statistics.jl problem before finding this (I'm new to all this, sorry, ha). Not sure if I should close the issue I opened over there (how?) and re-post it in CUDA or what.

Regardless, I'm interested in fixing/working on this, but the CUDA statistics link above is a 404 now, so I'm not quite sure where to begin. I think sort() works just fine on my CuArray and my problem was the missing/nan checks (scalar indexing) in Statistics.jl (skipmissing() doesn't work for the CuArray as suggested in the quantile() documentation, but I am not sure that's the correct fix?).

Any help is appreciated, thanks.

drayer587 avatar Nov 29 '21 16:11 drayer587

Those functions have been moved to GPUArrays: https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/host/statistics.jl

skipmissing is not going to work easily because it doesn't have a length property, or random-access indexing, which are generally required for GPU programming (where we don't iterate, but index directly with the thread index).

maleadt avatar Nov 30 '21 06:11 maleadt

Right, so the NaN is an easy fix right, just mapreduce(isnan,|,v) will do. Could do the same with missing, but this seems more like a stylistic question as to what quantile() should do/expect from a GPUArray.

drayer587 avatar Nov 30 '21 16:11 drayer587