InMemoryDatasets.jl
InMemoryDatasets.jl copied to clipboard
Sortperm for simple vector
Hi, I was wondering if it possible to expose the parallel sortperm function for "normal" vectors like Vector{Float64} or Vector{Int64}? Thank you Kind regards
Probably not (it heavily depends on Dataset). I guess there should be other packages for parallel sorting, and I guess the Base sort should be quite fast for vector{T}. The real benefits of IMD are in the case
- where missing values are exposed
- or/and there are multiple columns
- or/and there are many rows
which it will be worth to create a data set.
Base sort is actually slow compare to other language like c++ boost sorting algorithms which are multi threaded. A parallel sortperm would be very useful for large vectors.
Base sort is actually slow compare to other language like c++ boost sorting algorithms which are multi threaded.
I see. When there is only one column, IMD uses a simple approach for parallel sorting, however for multiple columns and, particularly, in common scenarios of data manipulation tasks (and with QuickSort) it is much more efficient than other algorithms.
I guess, in general, we need some customised algorithms for Vector{T} before being able to expose sortperm to users.
PS I like to see support of other sorting algorithms in IMD (#47 ) and probably we can think about this during that time.