mojo
mojo copied to clipboard
[Feature Request] Adding a builtin `sort`/`sorted` function
Review Mojo's priorities
- [X] I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
I think it would be a great improvement to the stdlib to bring a sort
/sorted
function that uses a Sortable
trait for things like large length string sorting.
What is your motivation for this change?
I use the sort
function all the time in python (especially with a key) and I'd love to see this make its way into Mojo
Any other details?
No response
I'm interested to support this feature.
I think we need to implement comparison for CollectionElement
'T' does not implement the '__lt__' method
That'd be a good start :)
This would be great! Right now we only have a basic sorting algorithm in the algorithm package that only works on AnyRegType if I recall correctly. Perhaps it makes sense to open source that as a reference and then we can make it work on AnyType, improve the performance of it, and use the ordering trait in https://github.com/modularml/mojo/issues/2462.
Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.
Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.
I don't have :( but willing to help on this! Perhaps building on what you have mentioned
Do you or @mzaks already have a generic sorting algorithm on AnyType/some refined trait such as the ordering one? If so, would very much welcome a PR into the library and we can build out this area.
I updated mojo-sort so insertion sort and quick sort are working on nightly, I am currently using following signature: fn quick_sort[D: CollectionElement, lt: fn (D, D) -> Bool](inout vector: List[D]):
to make it generic, but with OrderingComparable
trait it will be even simpler. For std sort I would implement a mix of quick and insertions sort, use insertion sort for small count and quick sort for larger count.
BTW if we will have a sorting package then I can also contribute Radix sort for numerical values. For large lists it is up to 30x faster than current std lib sort algorithm. Also in this case, a mix of insertion sort and Radix sort gives best performance, however important to know that Radix sort is not an in-place algorithm, so I would suggest to have it in std under an explicit name.
And another point, there is a class of sorting algorithms specific for strings.
https://github.com/rantala/string-sorting is a treasure trove when it comes to string sorting.
I ported multi key sorting algorithm in my repo, but now it seems to be buggy and slower than generic insertion sort or quick sort. But for the future I think it would be great to pick a few ( e.g. SIMD based) string sorting algorithm which performs best for certain use cases and implement them in std lib.
@JoeLoser how do we want to start? You will opensource the sorting package? I can add the ComparableCollectionElement
trait which is a CollectionElement
and has the compare methods. And implement a sorting function which is a mix of insertion and quick sort, for Lists where elements conform to ComparableCollectionElement
.
There are lots of sorting algorithm which has their own strength in specific cases. I think the built-in algorithm should be the best for general cases. We can take other languages stdlib as reference. In Python, they have Timsort. In Rust, they have quicksort for unstable, timsort for stable sort. https://doc.rust-lang.org/core/slice/sort/index.html.
I think it is fine for me if we start from @mzaks sorting algorithm, but I expect at the end we can have Timsort like what in Python and Rust, or other competitive sorting algorithms by default for most of the use cases. Other sorting algorithms should be a package imported if needed.
@JoeLoser I heard that there are plans to introduce benchmarks (infrastructure and code) to std lib. Is it correct? For evolution on algorithms and data structures it would be crucial to have it, so we can make data driven decisions.