chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Review the Sort module API

Open jabraham17 opened this issue 1 year ago • 3 comments

We are working to stabilize the Sort module. This issue captures the current API, with a few questions on naming on things we might like to change/improve before stabilizing.

Another topic to consider here is if these functions are the only API we want to provide. I cannot think of anything we would want to remove from the current public API, but maybe there is something to add. This is not a hard requirement for stabilization, as we can always add more later.

For reference: https://chapel-lang.org/docs/main/modules/packages/Sort.html

Top Level functions

  • proc sort(ref Data: [?Dom] ?eltType, comparator: ?rec = defaultComparator, param stable: bool = false, param inPlaceAlgorithm: bool = false)
    • Are we happy with all of the arguments and their names to this function? Items that come to mind:
      • should Data be lowercased?
      • should Dom be lowercased? arguably not apart of the API, but it sticks out compared to IO, which uses d when querying a domain
      • can inPlaceAlgorithm be shortened to inPlace
  • proc isSorted(Data: [?Dom] ?eltType, comparator: ?rec = defaultComparator) : bool
    • Are we happy with all of the arguments and their names to this function? Items that come to mind:
      • should Data be lowercased?
      • should Dom be lowercased? arguably not apart of the API, but it sticks out compared to IO, which uses d when querying a domain
  • iter sorted(x, comparator: ?rec = defaultComparator)
    • Are we happy with all of the arguments and their names to this function? Items that come to mind:
      • should x match the argument name for isSorted and sort

Comparators

  • const defaultComparator : DefaultComparator
    • DefaultComparator is a record, its name being PascalCase breaks our style guide. If we rename it to defaultComparator, what do we call the variable?
  • const reverseComparator : ReverseComparator(DefaultComparator)
    • reverseComparator is a record, its name being PascalCase breaks our style guide. If we rename it to reverseComparator, what do we call the variable?
    • I initially found the variable name reverseComparator a little confusing, perhaps reverseDefaultComparator? However, this feels like too long of name

To define a comparator, users must define a custom type with special methods.

  • proc MyComparator.key(a): ?retType
    • retType must have a < operator
  • proc MyComparator.keyPart(a, i: integral): (int, integral)
  • proc MyComparator.compare(a, b: a.type): ?retType where isNumeric(retType)

Should comparators be implemented using interfaces?

Should we require users to always define this custom type if they want custom sorting? This issue points out we could allow a sort overload that takes a FCF to do comparisons.

jabraham17 avatar Apr 15 '24 18:04 jabraham17

  • should x match the argument name for isSorted and sort

Iirc, x is a name we use in a lot of other places. So maybe the other two should use x instead? I do think it would be good to have consistency within the module when reasonable

Should comparators be implemented using interfaces?

That seems reasonable to me!

lydia-duncan avatar May 13 '24 21:05 lydia-duncan

In an offline discussion today, we primarily discussed proc sort.

A few general notes/highlights from the discussion:

  • This issue represents the public API, however the Sort module has a number of nodoced/private code that we would like to either remove or factor it out into some other place.
    • Namely, this refers to code that is not used as an internal "building block", like the bubble sort implementation
    • However, this code does not affect Sort module stability, as it is not apart of the public API
  • Part of this discussion should include the where clauses that constrain these functions. For example, how should these sorting procedures behave with a 2D array

A few notes/highlights specific to proc sort:

  • We want to keep this toplevel function named sort which modifies an array, rather than returning a new array
    • We want to make it obvious that the function does not return a value
  • We started discussing the various arguments, but made no conclusions yet
    • proposed changing Data: [?Dom] -> x: [?dom]
    • discussed placement/naming/potential removal of the stable argument
      • Reasons to keep it: some users may want a stable sort, even at the cost of performance
      • Reasons to remove it: its name can be confusing in context of api stability; we don't actually have anything hooked up to stable=true yet
    • we want to investigate how other languages provide a sort API to control these knobs for sorting

jabraham17 avatar May 20 '24 22:05 jabraham17

however the Sort module has a number of nodoced/private code that we would like to either remove or factor it out into some other place.

I'd really like for us to do this as well, primarily to reduce the amount of code that's pulled in when compiling things that use Sort (which I think is, currently, everything? though it'd be nice if that weren't the case as well).

For example, how should these sorting procedures behave with a 2D array

I'd constrain them to 1D arrays, at least for the time being. If someone wanted to sort a 2D local array, I think we should have some sort of 1D view of the local array's data that could be used to leverage the 1D sort. And sorting a 2D distributed array efficiently seems like it might not be something we'd want to sign up for right out of the gate.

proposed changing Data: [?Dom] -> x: [?dom]

From a stability perspective, the user doesn't need to know about the query and name of the domain since it isn't used again in the procedure's header. We've talked about having chpldoc automatically remove such cases (or we could potentially have it—or a tool like the linter—warn about such cases). Until then, the best thing to do here might be to change it into a local variable:

proc sort(x: []) { … }
ref dDom = x.domain;

I think we haven't taken a stance in the standard module style guide about the capitalization of arrays and domains yet. That means we could use either for now, but it also suggests maybe we need to have that discussion.

bradcray avatar May 20 '24 22:05 bradcray

Notes and consensus from todays discussion

  • We affirmed that the Sort module should not be included by default
  • .sorted on associative domains and arrays (https://github.com/Cray/chapel-private/issues/5976)
    • .sorted is callable on rectangular domains (although it doesn't actually work), being callable is a bug we intend to fix
    • .sorted on associative arrays does not work (nor should it)
    • .sorted on associative domains is not explicitly unstable today, but it should be. We would like to do something about this
  • We want to deprecate list.sort (currently unstable) and add a overload of Sort.sort as Sort.sort(ref x: list, .....)
    • the bulk of the implementation can be handled by an interface-like approach
  • iter sorted should work for anything that is iterable, including list, and set, and map.keys
  • API for sort: proc sort(ref x: [], comparator: ? = defaultComparator, param stable: bool = false, param inPlaceAlgorithm: bool = false)
    • The name for inPlaceAlgorithm is still under discussion
  • We touched on 1D vs 2D sorting again

Topics to conclude next time on this discussion:

  • .sorted on associative domains
  • 1D vs 2D sorting
  • inPlaceAlgorithm formal

jabraham17 avatar May 23 '24 23:05 jabraham17

(We'll also want to discuss the interface for isSorted, sorted and the provided comparators, in the interest of having a complete list of remaining discussion items)

lydia-duncan avatar May 24 '24 15:05 lydia-duncan

I have a question related to the "included by default" / .sorted on domains/arrays bullets, which I didn't fully understand: I thought we'd made some moves recently to avoid having things related to sorting available by default on arrays/domains by default (e.g., https://github.com/chapel-lang/chapel/issues/18089#issuecomment-1276500706) in order to not drag the Sort module in on every compile, and were moving to supporting them as tertiary methods on the Sort module instead. But then I don't think we got far enough to complete the work such that Sort is still included on every compile today. Is the intention with things like those .sorted() methods (which I imagine are part of the problem) to fix them in-place, or to move them to 'Sort' or retire them in hopes of getting away from compiling 'Sort' on every compile? Thanks!

(if it's not obvious, I'm a fan of having all sort-related stuff only available through the 'Sort' module… or some other package module)

bradcray avatar May 24 '24 18:05 bradcray

Our tentative plan is that the sort() function would be the main way to utilize this feature, but that the implementation for that function would rely on types providing a Sortable interface or something, to indicate what to do for each individual type in order to sort (this would avoid the need for the Sort module to import List to allow sort to be called on list, for instance). So there'd still be some contents in the locations with types we want to sort, but hopefully not nearly as much.

lydia-duncan avatar May 24 '24 19:05 lydia-duncan

That sounds intriguing, thanks! And when you say:

So there'd still be some contents in the locations with types we want to sort, but hopefully not nearly as much.

Is the expectation that those modules wouldn't need to use/import Sort anymore? Or, as a litmus test, will chpl hello.chpl compile 'Sort' due to its use of rectangular arrays for the Locales array and the like?

bradcray avatar May 24 '24 19:05 bradcray

I think it'd depend on where we put the Sortable interface, since the type would need to implement that and so would need to reference it. But I can imagine having a separate source file for "generally useful interfaces" that gets imported/used instead of the Sort module directly, or even a situation like what we have today with the Math and IO modules

lydia-duncan avatar May 24 '24 19:05 lydia-duncan

Notes and consensus from todays discussion

  • we affirmed that for the time being, we will continue to not support sorting multi dimensional or non-rectangular arrays
    • This should have a good error message for both proc sort and iter sorted
  • we want sort and sorted to be in "lock-step" with each other, they should have the same capabiklities
    • with the caveat that sort(myAssocDomain) will not work, while sorted(myAssocDomain) will. sorted is a superset of sort
    • this should be well documented, potentially with tables describing the overlaps
  • We are going to remove inPlaceAlgorithm entirely
    • Sort API: proc sort(ref x: [], comparator: ? = defaultComparator, param stable: bool = false)
    • There is still some discussion on if this should throw, @lydia-duncan is doing some investigation on this for consistency with other standard module design discussions
  • We will keep isSorted as proc isSorted(x: [], comparator: ? = defaultComparator): bool, with overloads for other types like list and associative domains

Up Next

  • Finalize whether proc sort should throw or not
  • Discuss iter sorted
  • Discuss comparators

jabraham17 avatar Jun 04 '24 23:06 jabraham17

Notes and consensus from todays discussion

  • we reached consensus on a proc sort, it will not be throwable specifically for OOM errors
    • this is based on consistency with other standard/internal modules, thanks @lydia-duncan
  • we reached consensus on iter sorted
    • it will remain a free function
    • it will work on a variety of types like 1D arrays, associative domains, and lists
    • it will have the signature iter sorted(x, comparator: ? = defaultComparator)
  • we started discussing comparators
    • there will be an optional method named key, the rest of the signature TBD
    • there will be an optional method named compare, the rest of the signature TBD
    • there will be an optional method named keyPart, it will return a tuple as (keyPartSection, integral), where keyPartSection is a new enum type. The exact name of the enum and the names of the enum members has not been finalized yet. The formal argument names/types are still TBD

jabraham17 avatar Jun 13 '24 18:06 jabraham17

Adding GPU sorting capabilities to the sort function

Currently there is proc gpuSort in the GPU module which allows for sorting of numeric data types. One of the goals when the GPU team was working on gpuSort has been to not need a special name for sorting on GPUs. Rather, just have sort do the right thing based on which locale it is called on.

Since we are stabilizing sort, it is important to take into consideration the following issues:

  • The current behavior of sort when called from inside a GPU locale on a GPU array or non-GPU array.
    • It just works somehow for numeric types. No errors, but causes a lot of element wise communication so it is really slow.
    • Segfaults for GPU allocated string arrays, works for non-GPU string arrays.
    • How do we want to change that today, before gpuSort is integrated.
  • gpuSort is incomplete in it's functionality: not all types are supported in some configurations.
  • gpuSort only takes arrays of numeric types, sorts them in ascending order, does not take comparators yet, and that may be a big lift.
  • gpuSort just calls sort in CPU-as-device mode

ShreyasKhandekar avatar Jun 18 '24 18:06 ShreyasKhandekar

Notes and discussion from 6/18

  • We want to consider allowing passing a FCP to the comparator argument. This would allow users to take a "shortcut" to pass a key function. And maybe we use reflection of some kind to distinguish between 1 argument or 2, so that users can pass key or compare as FCP. And if users want keyPart, they can use the full record. This is not something we are planning to stabilize for now, as its an additive change. this will be its own issue

  • we have stabilized the comparator functions

    • proc keyPart(elt, i: int): (keyPartStatus, integral)
    • proc key(elt)
    • proc compare(x, y: x.type)
    • the return type requirements for key (type must have <) and for compare (type must be signed integral) will be documented, and hopefully enforced
    • the formal names will not be enforced in the "interface", but the default/reverse comparators will use these
    • enum keyPartStatus has members returned, pre, post
  • comparator names

    • we are going to remove const defaultComparator and const reverseComparator in favor of using new defaultComparator() and new reverseDefaultComparator()
    • we are going to rename DefaultComparator to defaultComparator
    • we are going to rename ReverseComparator to reverseComparator
    • we are going to add type reverseDefaultComparator = reverseComparator(defaultComparator)
    • Rationale: there is no need for a global defaultComparator value, as most users should not be explicitly using it. We considered making the comparator argument to sort and sorted accept a type rather than a value, but there are cases where a stateful comparator is useful, which requires a value.
  • final two topics of discussion are gpuSort and interfaces

jabraham17 avatar Jun 20 '24 18:06 jabraham17

Notes and discussion from 6/25

  • there are no changes we want to make to the API to support gpuSort
    • sort should "just work", and whether it calls the existing implementation or gpuSort internally is an implementation detail/performance improvement.
    • as far as API stability is concerned, there is nothing to do for now
  • we want to make the comparators depend interfaces, rather than just records with special methods.
  • we will have three comparator interfaces
    • keyComparator requires the key method
    • keyPartComparator requires the keyPart method, and users can optionally provide a compare method as well
      • compare has a default implementation for this interface that can be overridden by a user
    • compareComparator (name still under discussion, see below) requires the compare method.
  • declaring a record that implements more than one of these interfaces is an error, they are mutually exclusive.
  • we are reserving the name sortComparator as the interface that represents the exclusive OR of these 3 interfaces
    • we don't actually have a way in the compiler to express this, but we want to be able to do this in the future

We have a number of ideas for a better name for compareComparator and did not yet come to a decision. Ideas:

  • binaryComparator
  • pairComparator
  • pairedComparator
  • relationalComparator
  • relativeComparator
  • relationComparator

jabraham17 avatar Jun 25 '24 23:06 jabraham17

Noting that we have come to decision on the name for compareComparator, it is going to be relativeComparator.

jabraham17 avatar Jul 03 '24 21:07 jabraham17

Summary of decisions made in this issue

  • Sort will be promoted from modules/packages to modules/standard
  • list.sort will be deprecated
  • .sorted on associative arrays and rectangular domains will be removed (the docs make it look like this works, but it doesn't today so we can remove these without breaking stability)
  • .sorted on associative domains will stay (see https://github.com/Cray/chapel-private/issues/5976#issuecomment-2226370139)

New Sort API

# for 1D arrays only
proc sort(ref x: [], comparator: ? = new defaultComparator(), param stable: bool = false)
proc isSorted(x: [], comparator: ? = new defaultComparator()): bool

proc sort(ref x: list, comparator: ? = new defaultComparator(), param stable: bool = false)
proc isSorted(x: list, comparator: ? = new defaultComparator()): bool

# works on 1D arrays, associative domains, list
iter sorted(x, comparator: ? = new defaultComparator())

Comparators

  • The global default comparators will be removed
  • DefaultComparator and ReverseComparator will be made camelCase
    • also adding type reverseDefaultComparator = reverseComparator(defaultComparator);
  • enum keyPartStatus will be added with the following enum members
    • returned
    • pre
    • post
  • Comparator methods will be enforced by interfaces, see https://github.com/chapel-lang/chapel/issues/25553
    • proc key(elt)
    • proc keyPart(elt, i: int): (keyPartStatus, integral)
    • proc compare(x, y: x.type)
  • In the future we will add https://github.com/chapel-lang/chapel/issues/25554

Followup issues:

  • https://github.com/chapel-lang/chapel/issues/25553
  • https://github.com/chapel-lang/chapel/issues/25552
  • https://github.com/chapel-lang/chapel/issues/25546
  • https://github.com/chapel-lang/chapel/issues/25545
  • https://github.com/chapel-lang/chapel/issues/25544

jabraham17 avatar Jul 12 '24 21:07 jabraham17

Closing, all discussion here is done and most things are implemented. Anything not yet implemented or that requires further discussion has its own issue and is linked above.

jabraham17 avatar Nov 05 '24 16:11 jabraham17