rfcs
rfcs copied to clipboard
Add RFC for SortBy
This PR adds my RFC for SortBy
primitive.
@damon-kwok can you expand the motivation to explain why, given the existence of Sort
that this would be valuable?
Additionally as part the detailed design, you should note there that is an existing implementation at X location. That information is currently hidden in the "How we test".
@SeanTAllen Updated.
@damon-kwok "is not beginner friendly" is a statement. You need as part of your motivation to demonstrate how it isn't beginner friendly and how SortBy is.
@damon-kwok I think the second argument is really the strong point here. SortBy
would let you sort objects that don't implement Comparable
. Additionally, it lets you define arbitrary sort orderings so that you aren't constrained by the one chosen by Comparable
.
"Beginner friendliness" is kind of arbitrary and may not be a point that you really want to try to argue.
@aturley Yes, i think you are correct.
Copying my comments from the original Zulip thread:
My one qualm is the comparator signature being
{(B): USize}
whereas you'd usually have{(B, B): ISize}
in other languages, and return a numeric value that represents the order of the two elements (-1, 0, or 1)The former forces you to think of how to assign a numeric value for every element, which doesn't make sense for example if you're sorting strings alphabetically
Another possibility would be
{(B, B): Bool}
representing agt
implementation. This could also use theinterface Comparable
methods as a default.
And one example that more closely matches the collections
library (ComparableSort
would be to Sort
what HashMap
is to Map
, for example):
https://playground.ponylang.io/?gist=4fe7e1b7403332e25a997ad29e482ea7
Copying zulip conversation context:
04:55 DamonKwok I think this may be complicated. My idea comes from Hashable, it only needs a fun hash (): USize. so, a lambda is work. Because a number already has all the interfaces you need.
04:58 Epic Eric Yeah I had to copy all methods but you could probably do with a single comparator like lt
I agree with the statements above from @EpicEric - the lambda should be given the elements to compare. Coming up with an object -> integer mapping is not always feasible for the things I may want to sort.
Apart from the specific concerns that @EpicEric mentioned above, I want to add more concerns into the mix:
Imagine I have an object that I want to sort based on comparing multiple fields. In order to convert them to a USize
, I'd have to first convert each one to a numeric representation, then do some bitshifting and bitmasking magic to make all of those numeric values fit into the USize
in the correct order. This is more complicated and in many cases probably less performant because I'd have to eagerly consider all fields whereas a compare lambda would let me return early as soon as one of the earlier fields was found to be greater/lesser, and only consider later fields in the rarer case that all of the earlier fields are equal.
Further, this approach of converting all fields of the object into a numeric representation and squeezing them into a USize
will not be physically possible for some objects, depending on the sizes (and relevant information density) of those fields.
To use a real world example from code I have written, consider a Time
class that is implemented with two fields:
-
total_seconds: U64
- The total number of seconds since0001-01-01 00:00:00.0
UTC -
nanosecond: U32
- The number of nanoseconds past the designated second
There are 96 bits of information to consider for sorting purposes, and I'll never be able to fit all of those bits into a USize
, which is 64 bits on some platforms and 32 bits on others. And that's an example with only two fields.
@damon-kwok what "real world" code can you point to that is using this SortBy? I think knowing that could help pinpoint additional tests, usage patterns that should be experimented with before this would be considered for the standard library.
@jemc If you have an extreme case that Usize cannot represent, that's not what SortBy is dealing with. SortBy focuses on conditional evaluations rather than solving all problems. See:
#Drawbacks The user must ensure that the lambda evaluation function is valid.
@SeanTAllen Many real world cases:
Leveldb manager:
- sort by key length
- sort by memory used
- sort by recent update
A blog system,
- sort by create date
- sort by last update time.
A set of JSON data representing the product information
sort by storage date sort by inventory number
Focusing on existing types without compareable implementations. The comparison problem of complex types is reduced to a number comparison problem. This is what Sortby is going to do.
I agree with @EpicEric and @jemc; I would prefer a lambda that takes two objects of the given type.
But that's not what I want to do. Maybe someone should add a new RFC for it.
I named it SortBy rather than any other name, hoping that it would be as simple and elegant as possible, and that no new types should be added for sorting. Otherwise it should be called something else.
With a generic lambda approach like the one I showed above, adding a SortBy
structure can be done in 10 lines:
primitive SortBy[A: Seq[B] ref, B: Any #read]
fun apply(seq: A, f: {(B): USize} val): A^ =>
Sort[A, B, SortFunction[B]](
seq,
object val is SortFunction[B]
fun lt(first: B, second: B): Bool => f(first) < f(second)
fun le(first: B, second: B): Bool => f(first) <= f(second)
fun gt(first: B, second: B): Bool => f(first) > f(second)
fun ge(first: B, second: B): Bool => f(first) >= f(second)
end)
actor Main
new create(env:Env) =>
let array = [ "aa"; "aaa"; "a" ]
SortBy[Array[String], String](array, {(x: String): USize => x.size() })
for e in array.values() do
env.out.print(e) // prints "a \n aa \n aaa"
end
https://playground.ponylang.io/?gist=144512d4a8417b60452e3084b4c3f32d
As a side note: due to the need for a concrete type, you can't avoid adding the Array[String]
type parameter in this case, matching the current Sort
type. So either way, the RFC must be fixed to take that into account.
@EpicEric I have a few questions:
- How does the extra 10 lines of code solve the problems raised by @jemc ?
- Your
SortBy
seems to do the same thing as mySortBy
, but it requires the overhead of 2 Object Literals? - I don't quite know your intention. My
SortBy
is a new primitive, it will not interfere with the existingSort
, but you seem to want to modify Sort's implementation. Is that right?
@EpicEric Can you create a temp git repository and submit a sort*.pony
file and a _test.pony
file that can be used by others? This can help others more clearly compare the difference between the two.
- How does the extra 10 lines of code solve the problems raised by @jemc ?
It doesn't. You'd create a sort function using the Sort
implementation directly instead, without using SortBy
.
primitive TimeSortFunction is SortFunction[Time]
fun lt(first: Time, second: Time): Bool =>
(first.total_seconds < second.total_seconds) || (
(first.total_seconds == second.total_seconds)
&& (first.nanosecond < second.nanosecond))
fun gt(first: Time, second: Time): Bool =>
(first.total_seconds > second.total_seconds) || (
(first.total_seconds == second.total_seconds)
&& first.nanosecond > second.nanosecond))
fun le(first: Time, second: Time): Bool =>
(first.total_seconds < second.total_seconds) || (
(first.total_seconds == second.total_seconds)
&& first.nanosecond <= second.nanosecond))
fun ge(first: Time, second: Time): Bool =>
(first.total_seconds > second.total_seconds) || (
(first.total_seconds == second.total_seconds)
&& (first.nanosecond >= second.nanosecond))
primitive TimeSort[A: Seq[Time] ref]
fun apply(a: A): A^ =>
Sort[A, Time, TimeSortFunction](a, TimeSort)
- Your
SortBy
seems to do the same thing as mySortBy
, but it requires the overhead of 2 Object Literals?
This RFC doesn't have any code implementation, so I'm assuming you refer to this code. In this case, yes, there is one extra anonymous object allocation.
- I don't quite know your intention. My
SortBy
is a new primitive, it will not interfere with the existingSort
, but you seem to want to modify Sort's implementation. Is that right?
This is only if API names have to follow the same convention as the rest of the collections
package. Compatibility could be maintained if Sort
and ComparableSort
are renamed to, say, RawSort
and Sort
respectively.
@EpicEric Thanks! I see. I know the process of adding RFC is long and we can listen to other people's ideas.
Do you still want to add this RFC?
While this may be a reasonable addition, I feel that it violates least surprise as it exists. Specifically:
- Most of the comments here seemed to have expected from the name that it would be a comparison function
- The comparable interface is based on comparison functions, rather than valuations
- This is fixed to be USize, rather than any other type (and worse, the information available in USize is system-dependent)
It does not appear to me that a comparison-based solution has a very significant difference in allocation. If it just requires a single method (such as compare), then the allocation characteristics will be identical to a lambda returning USize. Of course as with HashFunctions we can have a generic primitive to dispatch without allocation in the general case
@jasoncarr0 Do you mean to change USize
to U128
or I128
?
I would actually expect U64
. The issue with USize is that the maximum value depends on the system it runs on.
@jasoncarr0 Updated.
I agree with the commenters that I would expect something called SortBy
to use a comparator.
If that's not possible it should at least use a key which can be complex (e.g. tuple) and support lexicographic sorting.
Here's a survey of languages sorting support
- JavaScript
- Offers sorting with comparison functions via
Array.sort([compareFunction])
- Offers sorting with comparison functions via
- C++
- Offers sorting with comparison functions via
std::sort
- Offers sorting with comparison functions via
- Python
- Java
- Allows simple sorting and defining a sorting for a type via
Comparable
- Also allows custom sorts to be provided using the
Comparator
class
- Allows simple sorting and defining a sorting for a type via
- Rust
- Offers sorting and searching based on trait
std::cmp::Ord
- Allows sorting of slices using
sort_by
which takes a comparator. - The newtype zero cost abstraction can be used to wrap a type with a new sorting behavior.
- Offers sorting and searching based on trait