chapel
chapel copied to clipboard
Review the Sort module API
We are working to stabilize the Sort module. This issue captures the current API, with a few questions on naming on things we might like to change/improve before stabilizing.
Another topic to consider here is if these functions are the only API we want to provide. I cannot think of anything we would want to remove from the current public API, but maybe there is something to add. This is not a hard requirement for stabilization, as we can always add more later.
For reference: https://chapel-lang.org/docs/main/modules/packages/Sort.html
Top Level functions
proc sort(ref Data: [?Dom] ?eltType, comparator: ?rec = defaultComparator, param stable: bool = false, param inPlaceAlgorithm: bool = false)- Are we happy with all of the arguments and their names to this function? Items that come to mind:
- should
Databe lowercased? - should
Dombe lowercased? arguably not apart of the API, but it sticks out compared toIO, which usesdwhen querying a domain - can
inPlaceAlgorithmbe shortened toinPlace
- should
- Are we happy with all of the arguments and their names to this function? Items that come to mind:
proc isSorted(Data: [?Dom] ?eltType, comparator: ?rec = defaultComparator) : bool- Are we happy with all of the arguments and their names to this function? Items that come to mind:
- should
Databe lowercased? - should
Dombe lowercased? arguably not apart of the API, but it sticks out compared toIO, which usesdwhen querying a domain
- should
- Are we happy with all of the arguments and their names to this function? Items that come to mind:
iter sorted(x, comparator: ?rec = defaultComparator)- Are we happy with all of the arguments and their names to this function? Items that come to mind:
- should
xmatch the argument name forisSortedandsort
- should
- Are we happy with all of the arguments and their names to this function? Items that come to mind:
Comparators
const defaultComparator : DefaultComparatorDefaultComparatoris a record, its name being PascalCase breaks our style guide. If we rename it todefaultComparator, what do we call the variable?
const reverseComparator : ReverseComparator(DefaultComparator)reverseComparatoris a record, its name being PascalCase breaks our style guide. If we rename it toreverseComparator, what do we call the variable?- I initially found the variable name
reverseComparatora little confusing, perhapsreverseDefaultComparator? However, this feels like too long of name
To define a comparator, users must define a custom type with special methods.
proc MyComparator.key(a): ?retTyperetTypemust have a<operator
proc MyComparator.keyPart(a, i: integral): (int, integral)proc MyComparator.compare(a, b: a.type): ?retType where isNumeric(retType)
Should comparators be implemented using interfaces?
Should we require users to always define this custom type if they want custom sorting? This issue points out we could allow a sort overload that takes a FCF to do comparisons.
- should
xmatch the argument name forisSortedandsort
Iirc, x is a name we use in a lot of other places. So maybe the other two should use x instead? I do think it would be good to have consistency within the module when reasonable
Should comparators be implemented using interfaces?
That seems reasonable to me!
In an offline discussion today, we primarily discussed proc sort.
A few general notes/highlights from the discussion:
- This issue represents the public API, however the Sort module has a number of nodoced/private code that we would like to either remove or factor it out into some other place.
- Namely, this refers to code that is not used as an internal "building block", like the bubble sort implementation
- However, this code does not affect Sort module stability, as it is not apart of the public API
- Part of this discussion should include the
whereclauses that constrain these functions. For example, how should these sorting procedures behave with a 2D array
A few notes/highlights specific to proc sort:
- We want to keep this toplevel function named
sortwhich modifies an array, rather than returning a new array- We want to make it obvious that the function does not return a value
- We started discussing the various arguments, but made no conclusions yet
- proposed changing
Data: [?Dom]->x: [?dom] - discussed placement/naming/potential removal of the
stableargument- Reasons to keep it: some users may want a stable sort, even at the cost of performance
- Reasons to remove it: its name can be confusing in context of api stability; we don't actually have anything hooked up to
stable=trueyet
- we want to investigate how other languages provide a sort API to control these knobs for sorting
- proposed changing
however the Sort module has a number of nodoced/private code that we would like to either remove or factor it out into some other place.
I'd really like for us to do this as well, primarily to reduce the amount of code that's pulled in when compiling things that use Sort (which I think is, currently, everything? though it'd be nice if that weren't the case as well).
For example, how should these sorting procedures behave with a 2D array
I'd constrain them to 1D arrays, at least for the time being. If someone wanted to sort a 2D local array, I think we should have some sort of 1D view of the local array's data that could be used to leverage the 1D sort. And sorting a 2D distributed array efficiently seems like it might not be something we'd want to sign up for right out of the gate.
proposed changing Data: [?Dom] -> x: [?dom]
From a stability perspective, the user doesn't need to know about the query and name of the domain since it isn't used again in the procedure's header. We've talked about having chpldoc automatically remove such cases (or we could potentially have it—or a tool like the linter—warn about such cases). Until then, the best thing to do here might be to change it into a local variable:
proc sort(x: []) { … }
ref dDom = x.domain;
I think we haven't taken a stance in the standard module style guide about the capitalization of arrays and domains yet. That means we could use either for now, but it also suggests maybe we need to have that discussion.
Notes and consensus from todays discussion
- We affirmed that the Sort module should not be included by default
.sortedon associative domains and arrays (https://github.com/Cray/chapel-private/issues/5976).sortedis callable on rectangular domains (although it doesn't actually work), being callable is a bug we intend to fix.sortedon associative arrays does not work (nor should it).sortedon associative domains is not explicitly unstable today, but it should be. We would like to do something about this
- We want to deprecate
list.sort(currently unstable) and add a overload ofSort.sortasSort.sort(ref x: list, .....)- the bulk of the implementation can be handled by an interface-like approach
iter sortedshould work for anything that is iterable, includinglist, andset, andmap.keys- API for
sort:proc sort(ref x: [], comparator: ? = defaultComparator, param stable: bool = false, param inPlaceAlgorithm: bool = false)- The name for
inPlaceAlgorithmis still under discussion
- The name for
- We touched on 1D vs 2D sorting again
Topics to conclude next time on this discussion:
- .sorted on associative domains
- 1D vs 2D sorting
- inPlaceAlgorithm formal
(We'll also want to discuss the interface for isSorted, sorted and the provided comparators, in the interest of having a complete list of remaining discussion items)
I have a question related to the "included by default" / .sorted on domains/arrays bullets, which I didn't fully understand: I thought we'd made some moves recently to avoid having things related to sorting available by default on arrays/domains by default (e.g., https://github.com/chapel-lang/chapel/issues/18089#issuecomment-1276500706) in order to not drag the Sort module in on every compile, and were moving to supporting them as tertiary methods on the Sort module instead. But then I don't think we got far enough to complete the work such that Sort is still included on every compile today. Is the intention with things like those .sorted() methods (which I imagine are part of the problem) to fix them in-place, or to move them to 'Sort' or retire them in hopes of getting away from compiling 'Sort' on every compile? Thanks!
(if it's not obvious, I'm a fan of having all sort-related stuff only available through the 'Sort' module… or some other package module)
Our tentative plan is that the sort() function would be the main way to utilize this feature, but that the implementation for that function would rely on types providing a Sortable interface or something, to indicate what to do for each individual type in order to sort (this would avoid the need for the Sort module to import List to allow sort to be called on list, for instance). So there'd still be some contents in the locations with types we want to sort, but hopefully not nearly as much.
That sounds intriguing, thanks! And when you say:
So there'd still be some contents in the locations with types we want to sort, but hopefully not nearly as much.
Is the expectation that those modules wouldn't need to use/import Sort anymore? Or, as a litmus test, will chpl hello.chpl compile 'Sort' due to its use of rectangular arrays for the Locales array and the like?
I think it'd depend on where we put the Sortable interface, since the type would need to implement that and so would need to reference it. But I can imagine having a separate source file for "generally useful interfaces" that gets imported/used instead of the Sort module directly, or even a situation like what we have today with the Math and IO modules
Notes and consensus from todays discussion
- we affirmed that for the time being, we will continue to not support sorting multi dimensional or non-rectangular arrays
- This should have a good error message for both
proc sortanditer sorted
- This should have a good error message for both
- we want
sortandsortedto be in "lock-step" with each other, they should have the same capabiklities- with the caveat that
sort(myAssocDomain)will not work, whilesorted(myAssocDomain)will.sortedis a superset ofsort - this should be well documented, potentially with tables describing the overlaps
- with the caveat that
- We are going to remove
inPlaceAlgorithmentirely- Sort API:
proc sort(ref x: [], comparator: ? = defaultComparator, param stable: bool = false) - There is still some discussion on if this should throw, @lydia-duncan is doing some investigation on this for consistency with other standard module design discussions
- Sort API:
- We will keep
isSortedasproc isSorted(x: [], comparator: ? = defaultComparator): bool, with overloads for other types likelistand associative domains
Up Next
- Finalize whether
proc sortshould throw or not - Discuss
iter sorted - Discuss comparators
Notes and consensus from todays discussion
- we reached consensus on a
proc sort, it will not be throwable specifically for OOM errors- this is based on consistency with other standard/internal modules, thanks @lydia-duncan
- we reached consensus on
iter sorted- it will remain a free function
- it will work on a variety of types like 1D arrays, associative domains, and lists
- it will have the signature
iter sorted(x, comparator: ? = defaultComparator)
- we started discussing comparators
- there will be an optional method named
key, the rest of the signature TBD - there will be an optional method named
compare, the rest of the signature TBD - there will be an optional method named
keyPart, it will return a tuple as(keyPartSection, integral), wherekeyPartSectionis a new enum type. The exact name of the enum and the names of the enum members has not been finalized yet. The formal argument names/types are still TBD
- there will be an optional method named
Adding GPU sorting capabilities to the sort function
Currently there is proc gpuSort in the GPU module which allows for sorting of numeric data types.
One of the goals when the GPU team was working on gpuSort has been to not need a special name for sorting on GPUs. Rather, just have sort do the right thing based on which locale it is called on.
Since we are stabilizing sort, it is important to take into consideration the following issues:
- The current behavior of
sortwhen called from inside a GPU locale on a GPU array or non-GPU array.- It just works somehow for numeric types. No errors, but causes a lot of element wise communication so it is really slow.
- Segfaults for GPU allocated string arrays, works for non-GPU string arrays.
- How do we want to change that today, before
gpuSortis integrated.
gpuSortis incomplete in it's functionality: not all types are supported in some configurations.gpuSortonly takes arrays of numeric types, sorts them in ascending order, does not take comparators yet, and that may be a big lift.gpuSortjust callssortin CPU-as-device mode
Notes and discussion from 6/18
-
We want to consider allowing passing a FCP to the
comparatorargument. This would allow users to take a "shortcut" to pass akeyfunction. And maybe we use reflection of some kind to distinguish between 1 argument or 2, so that users can passkeyorcompareas FCP. And if users wantkeyPart, they can use the full record. This is not something we are planning to stabilize for now, as its an additive change. this will be its own issue -
we have stabilized the comparator functions
proc keyPart(elt, i: int): (keyPartStatus, integral)proc key(elt)proc compare(x, y: x.type)- the return type requirements for
key(type must have<) and forcompare(type must be signed integral) will be documented, and hopefully enforced - the formal names will not be enforced in the "interface", but the default/reverse comparators will use these
enum keyPartStatushas membersreturned,pre,post
-
comparator names
- we are going to remove
const defaultComparatorandconst reverseComparatorin favor of usingnew defaultComparator()andnew reverseDefaultComparator() - we are going to rename
DefaultComparatortodefaultComparator - we are going to rename
ReverseComparatortoreverseComparator - we are going to add
type reverseDefaultComparator = reverseComparator(defaultComparator) - Rationale: there is no need for a global
defaultComparatorvalue, as most users should not be explicitly using it. We considered making thecomparatorargument tosortandsortedaccept a type rather than a value, but there are cases where a stateful comparator is useful, which requires a value.
- we are going to remove
-
final two topics of discussion are
gpuSortand interfaces
Notes and discussion from 6/25
- there are no changes we want to make to the API to support
gpuSortsortshould "just work", and whether it calls the existing implementation orgpuSortinternally is an implementation detail/performance improvement.- as far as API stability is concerned, there is nothing to do for now
- we want to make the comparators depend interfaces, rather than just records with special methods.
- we will have three comparator interfaces
keyComparatorrequires thekeymethodkeyPartComparatorrequires thekeyPartmethod, and users can optionally provide acomparemethod as wellcomparehas a default implementation for this interface that can be overridden by a user
compareComparator(name still under discussion, see below) requires thecomparemethod.
- declaring a record that implements more than one of these interfaces is an error, they are mutually exclusive.
- we are reserving the name
sortComparatoras the interface that represents the exclusive OR of these 3 interfaces- we don't actually have a way in the compiler to express this, but we want to be able to do this in the future
We have a number of ideas for a better name for compareComparator and did not yet come to a decision.
Ideas:
- binaryComparator
- pairComparator
- pairedComparator
- relationalComparator
- relativeComparator
- relationComparator
Noting that we have come to decision on the name for compareComparator, it is going to be relativeComparator.
Summary of decisions made in this issue
- Sort will be promoted from
modules/packagestomodules/standard list.sortwill be deprecated.sortedon associative arrays and rectangular domains will be removed (the docs make it look like this works, but it doesn't today so we can remove these without breaking stability).sortedon associative domains will stay (see https://github.com/Cray/chapel-private/issues/5976#issuecomment-2226370139)
New Sort API
# for 1D arrays only
proc sort(ref x: [], comparator: ? = new defaultComparator(), param stable: bool = false)
proc isSorted(x: [], comparator: ? = new defaultComparator()): bool
proc sort(ref x: list, comparator: ? = new defaultComparator(), param stable: bool = false)
proc isSorted(x: list, comparator: ? = new defaultComparator()): bool
# works on 1D arrays, associative domains, list
iter sorted(x, comparator: ? = new defaultComparator())
Comparators
- The global default comparators will be removed
DefaultComparatorandReverseComparatorwill be made camelCase- also adding
type reverseDefaultComparator = reverseComparator(defaultComparator);
- also adding
enum keyPartStatuswill be added with the following enum members- returned
- pre
- post
- Comparator methods will be enforced by interfaces, see https://github.com/chapel-lang/chapel/issues/25553
proc key(elt)proc keyPart(elt, i: int): (keyPartStatus, integral)proc compare(x, y: x.type)
- In the future we will add https://github.com/chapel-lang/chapel/issues/25554
Followup issues:
- https://github.com/chapel-lang/chapel/issues/25553
- https://github.com/chapel-lang/chapel/issues/25552
- https://github.com/chapel-lang/chapel/issues/25546
- https://github.com/chapel-lang/chapel/issues/25545
- https://github.com/chapel-lang/chapel/issues/25544
Closing, all discussion here is done and most things are implemented. Anything not yet implemented or that requires further discussion has its own issue and is linked above.