mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[Feature Request] Adding a builtin `sort`/`sorted` function

Open Moosems opened this issue 2 months ago • 14 comments

Review Mojo's priorities

What is your request?

I think it would be a great improvement to the stdlib to bring a sort/sorted function that uses a Sortable trait for things like large length string sorting.

What is your motivation for this change?

I use the sort function all the time in python (especially with a key) and I'd love to see this make its way into Mojo

Any other details?

No response

Moosems avatar Apr 23 '24 22:04 Moosems

I'm interested to support this feature.

jayzhan211 avatar Apr 24 '24 07:04 jayzhan211

I think we need to implement comparison for CollectionElement

'T' does not implement the '__lt__' method

jayzhan211 avatar Apr 30 '24 07:04 jayzhan211

That'd be a good start :)

Moosems avatar Apr 30 '24 11:04 Moosems

This would be great! Right now we only have a basic sorting algorithm in the algorithm package that only works on AnyRegType if I recall correctly. Perhaps it makes sense to open source that as a reference and then we can make it work on AnyType, improve the performance of it, and use the ordering trait in https://github.com/modularml/mojo/issues/2462.

Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.

JoeLoser avatar May 03 '24 12:05 JoeLoser

Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.

I don't have :( but willing to help on this! Perhaps building on what you have mentioned

jayzhan211 avatar May 03 '24 13:05 jayzhan211

Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.

I updated mojo-sort so insertion sort and quick sort are working on nightly, I am currently using following signature: fn quick_sort[D: CollectionElement, lt: fn (D, D) -> Bool](inout vector: List[D]): to make it generic, but with OrderingComparable trait it will be even simpler. For std sort I would implement a mix of quick and insertions sort, use insertion sort for small count and quick sort for larger count.

BTW if we will have a sorting package then I can also contribute Radix sort for numerical values. For large lists it is up to 30x faster than current std lib sort algorithm. Also in this case, a mix of insertion sort and Radix sort gives best performance, however important to know that Radix sort is not an in-place algorithm, so I would suggest to have it in std under an explicit name.

mzaks avatar May 03 '24 13:05 mzaks

And another point, there is a class of sorting algorithms specific for strings.

https://github.com/rantala/string-sorting is a treasure trove when it comes to string sorting.

I ported multi key sorting algorithm in my repo, but now it seems to be buggy and slower than generic insertion sort or quick sort. But for the future I think it would be great to pick a few ( e.g. SIMD based) string sorting algorithm which performs best for certain use cases and implement them in std lib.

mzaks avatar May 03 '24 14:05 mzaks

@JoeLoser how do we want to start? You will opensource the sorting package? I can add the ComparableCollectionElement trait which is a CollectionElement and has the compare methods. And implement a sorting function which is a mix of insertion and quick sort, for Lists where elements conform to ComparableCollectionElement.

mzaks avatar May 03 '24 14:05 mzaks

There are lots of sorting algorithm which has their own strength in specific cases. I think the built-in algorithm should be the best for general cases. We can take other languages stdlib as reference. In Python, they have Timsort. In Rust, they have quicksort for unstable, timsort for stable sort. https://doc.rust-lang.org/core/slice/sort/index.html.

I think it is fine for me if we start from @mzaks sorting algorithm, but I expect at the end we can have Timsort like what in Python and Rust, or other competitive sorting algorithms by default for most of the use cases. Other sorting algorithms should be a package imported if needed.

jayzhan211 avatar May 03 '24 14:05 jayzhan211

@JoeLoser I heard that there are plans to introduce benchmarks (infrastructure and code) to std lib. Is it correct? For evolution on algorithms and data structures it would be crucial to have it, so we can make data driven decisions.

mzaks avatar May 03 '24 15:05 mzaks