flux
flux copied to clipboard
Consistent comparisons
We have several adaptors and algorithms which take a comparator, and require that the comparator produces a strict weak order over the sequence elements:
find_min/find_max/find_minmaxmin/max/minmaxsortset_union/set_difference/set_symmetric_difference/set_intersectioncmp::min/cmp::max(for a sequence of two elements)
The comparator for all of these functions follows the classic STL model of requiring a binary predicate which returns true if the first argument is "less than" the second, defaulting to std::ranges::less. We use the standard library std::strict_weak_order concept, in most cases via our own strict_weak_order_for<Seq1, Seq2> concept which checks std::strict_weak_order for all combinations of the sequences' element_t, value_t& and common_element_t (the Flux equivalent of Ranges' std::indirect_strict_weak_order).
The outlier here is flux::compare(seq1, seq2, cmp) which requires the comparator to return a value of one of the three C++20 comparison categories (std::partial_ordering, std::weak_ordering or std::strong_ordering) and performs a lexicographical comparison of the sequence elements returning that category -- that is, the same as std::lexicographical_compare_three_way.
First of all, this is inconsistent. We shouldn't have one algorithm with a completely different interface to all the others.
A simple solution would be to change our compare to be the equivalent of std::lexicographical_compare -- that is, require just a "less than" predicate like everything else.
A potentially better solution would be to consistently use three way comparison everywhere -- specifically, requiring a comparator for all of the above algorithms (except compare) to return at least std::weak_ordering. After all, we're a C++20 library, we can do things the modern way!
As with everything though, there are pros and cons to doing this:
Pros:
- Correctness. At the moment, strict weak ordering is basically just a semantic requirement. Although a user could theoretically supply a custom comparator that returns
std::weak_orderingbut "lies" about it, this seems much less likely than accidentally passing an incorrect predicate. - Proper handling of NaNs: Right now,
flux::sort(vector<float>)compiles but does the wrong thing if the data contains NaN values. With the proposed change (assumingstd::compare_three_wayas the default comparator) then this will fail to compile. This forces the user to think about how they want to handle NaNs; either by passingstd::strong_orderto use the IEEE total ordering, or by using a custom comparator that ignores NaN values.
Cons:
- By default, only types with a spaceship operator could be used with the Flux algorithms. Of course, defaulting the operator is very easy in C++20, but there's probably a lot of code out there that only defines the classic relational operators
- Potentially, worse codegen. These algorithms only really care about less-than, but custom comparators (or
op<=>implementations) might need to do two compares rather than one. It's probably worth benchmarking this. In particular, usingstd::strong_orderto compare floats generates a lot more code thanstd::less. - Ergonomics: right now it's easy to sort in descending order by saying
sort(vec, std::greater{}). We'd probably want to provide a comparator that invertsordering::greaterandordering::lessto allow the same thing without the user needing to write a lambda to do it themselves. - Unfamiliarity: the STL has used less-than predicate comparators for 30 years, and now suddenly we're asking people to do something different. Likewise, "why can't I use
flux::sorton my vector of doubles,std::sortworks fine"
Overall, I think this is worth investigating -- at least, benchmarking what happens if we use our current sort with a comparator defined as (roughly):
bool cmp(T& lhs, T& rhs) { return std::is_lt(std::weak_order(lhs, rhs)); }
and seeing what happens.