InMemoryDatasets.jl icon indicating copy to clipboard operation
InMemoryDatasets.jl copied to clipboard

Sortperm for simple vector

Open gitboy16 opened this issue 2 years ago • 3 comments

Hi, I was wondering if it possible to expose the parallel sortperm function for "normal" vectors like Vector{Float64} or Vector{Int64}? Thank you Kind regards

gitboy16 avatar May 04 '23 09:05 gitboy16

Probably not (it heavily depends on Dataset). I guess there should be other packages for parallel sorting, and I guess the Base sort should be quite fast for vector{T}. The real benefits of IMD are in the case

  • where missing values are exposed
  • or/and there are multiple columns
  • or/and there are many rows

which it will be worth to create a data set.

sl-solution avatar May 04 '23 09:05 sl-solution

Base sort is actually slow compare to other language like c++ boost sorting algorithms which are multi threaded. A parallel sortperm would be very useful for large vectors.

gitboy16 avatar May 04 '23 10:05 gitboy16

Base sort is actually slow compare to other language like c++ boost sorting algorithms which are multi threaded.

I see. When there is only one column, IMD uses a simple approach for parallel sorting, however for multiple columns and, particularly, in common scenarios of data manipulation tasks (and with QuickSort) it is much more efficient than other algorithms.

I guess, in general, we need some customised algorithms for Vector{T} before being able to expose sortperm to users.

PS I like to see support of other sorting algorithms in IMD (#47 ) and probably we can think about this during that time.

sl-solution avatar May 04 '23 10:05 sl-solution