libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

Introduce a `Compare` trait in the cmp mod

Open tisonkun opened this issue 11 months ago • 23 comments

Proposal

Problem statement

We have core::cmp::Reverse to wrap a type for changing the ordering logic of a type T. However, under certain generic context, it would introduce unnecessary burden over the input type T where T can be qualified instead.

Introducing a Compare trait can help in such a situation that qualified the type T as is, and accepting a new type C: Compare, acting as a comparator, to switch among different comparison functions.

Motivating examples or use cases

We have functions like:

fn do_register_extremum<T, C>(registry: &mut FuncRegistry, name: &'static str)
where
    T: ConcreteType,
    C: Compare<T::Datum, T::Datum> + Default,
{ ... }

and use the compare traits:

do_register_extremum::<T, Natural<_>>(registry, "min");
do_register_extremum::<T, Rev<Natural<_>>>(registry, "max");

If using core::cmp::Reverse, it would require complex method forwarding to let Reverse<T> implement everything T does.

Take a more complex case, we make a PartialCompare trait based on the Compare trait

/// A trait like [`Compare`] but only requires partial comparison ability.
pub trait PartialCompare<L: ?Sized, R: ?Sized = L> {
    fn compare(&self, l: &L, r: &R) -> Option<Ordering>;

    fn compares_lt(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Less))
    }

    fn compares_le(&self, l: &L, r: &R) -> bool {
        matches!(
            self.compare(l, r),
            Some(Ordering::Less) | Some(Ordering::Equal)
        )
    }

    fn compares_gt(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Greater))
    }

    fn compares_ge(&self, l: &L, r: &R) -> bool {
        matches!(
            self.compare(l, r),
            Some(Ordering::Greater) | Some(Ordering::Equal)
        )
    }

    fn compares_eq(&self, l: &L, r: &R) -> bool {
        matches!(self.compare(l, r), Some(Ordering::Equal))
    }

    fn compares_ne(&self, l: &L, r: &R) -> bool {
        !matches!(self.compare(l, r), Some(Ordering::Equal))
    }
}

impl<L, R, C: Compare<L, R>> PartialCompare<L, R> for C {
    fn compare(&self, l: &L, r: &R) -> Option<Ordering> {
        Some(self.compare(l, r))
    }
}

and use it as:

    register_common_comparison::<IntType, IntType, Natural<_>>(registry);
    register_common_comparison::<UIntType, UIntType, Natural<_>>(registry);
    register_common_comparison::<FloatType, FloatType, Natural<_>>(registry);
    register_common_comparison::<BinaryType, BinaryType, Borrowing<Natural<[u8]>, _, _>>(registry);
    register_common_comparison::<StringType, StringType, Borrowing<Natural<str>, _, _>>(registry);
    register_common_comparison::<BooleanType, BooleanType, Natural<_>>(registry);
    register_common_comparison::<TimestampType, TimestampType, Natural<_>>(registry);
    register_common_comparison::<IntervalType, IntervalType, Natural<_>>(registry);
    register_common_comparison::<VariantType, VariantType, VariantCompare>(registry);

    register_common_comparison::<IntType, UIntType, NumericCompare>(registry);
    register_common_comparison::<UIntType, IntType, NumericCompare>(registry);
    register_common_comparison::<IntType, FloatType, NumericCompare>(registry);
    register_common_comparison::<UIntType, FloatType, NumericCompare>(registry);
    register_common_comparison::<FloatType, IntType, NumericCompare>(registry);
    register_common_comparison::<FloatType, UIntType, NumericCompare>(registry);

fn register_common_comparison<T, U, C>(registry: &mut FuncRegistry)
where
    T: ConcreteType,
    U: ConcreteType,
    for<'a> C: PartialCompare<T::DatumRef<'a>, U::DatumRef<'a>> + Default + Copy,
{ .. }

You can imagine if we have to wrap all the types here with Reverse or similar approaches, it would be quite challenging to forward all the traits and make the type system happy.

Solution sketch

Bringing Compare and PartialCompare into the core lib would help in this case, where the compare crate is linked above, and the PartialCompare implementation is inlined above.

~~I'd first submit this issue to see if this is the way to go, further implement details can go as follow-ups.~~

Add four traits:

pub trait PartialEqual<L: ?Sized, R: ?Sized = L> {
  fn eq(&self, l: &L, r: &R) -> bool;
  fn ne(&self, l: &L, r: &R) -> bool { .. }
}

pub trait Equal<T: ?Sized>: PartialEqual<T, T> {}

pub trait PartialCompare<L: ?Sized, R: ?Sized = L>: PartialEqual<L, R> {
  fn partial_cmp(&self, l: &L, r: &R) -> Option<Ordering>;
  fn lt(&self, l: &L, r: &R) -> bool { .. }
  fn le(&self, l: &L, r: &R) -> bool { .. }
  fn gt(&self, l: &L, r: &R) -> bool { .. }
  fn ge(&self, l: &L, r: &R) -> bool { .. }
}

pub trait Compare<T: ?Sized>: Equal<T> {
  fn cmp(&self, l: &T, r: &T) -> Ordering;
  fn max(&self, l: T, r: T) -> bool { .. }
  fn min(&self, l: T, r: T) -> bool { .. }
  fn clamp(&self, v: T, l: T, r: T) -> bool { .. }
}

Alternatives

Just use the compare crate or make a new crates.io crate.

However, the compare crate isn't updates for more than nine years, and somehow the core lib has cmp mod that already provides some functionality to support this use case. I wonder if it's suitable as a supplement to the core lib.

Links and related work

  • https://crates.io/crates/compare
  • https://crates.io/crates/binary-heap-plus (showcases how compare can be used)

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

tisonkun avatar Jan 26 '25 15:01 tisonkun

cc @Amanieu as a libs-team member and ever touch the compare crate.

tisonkun avatar Jan 26 '25 15:01 tisonkun

Would it fit into existing collections like BTreeMap?

the8472 avatar Jan 26 '25 15:01 the8472

@the8472 Using these Compare/PartialCompare trait in collections would introduce more topics to discuss.

I found a previous discussion in IRLO. It seems to be no blocker for adding a new generic parameter like the extension of allocator-api. But I'd break it down into a few steps:

  1. Determinate whether patching existing collections is possible (so far, it is)
  2. Initiate a proposal for the patch, for both implementation and compatibility.
  3. Implement it.

This is, however, not a blocker for introducing the Compare/PartialCompare trait as a core lib API for every Rust user, as they're proposed to be public APIs.

tisonkun avatar Jan 26 '25 16:01 tisonkun

This is, however, not a blocker for introducing the Compare/PartialCompare trait as a core lib API

Implementing is not a blocker, checking whether the design is compatible would be imo, since it would be quite unfortunate to add a mechanism to the standard library which doesn't integrate with other parts of the library.

the8472 avatar Jan 26 '25 16:01 the8472

@the8472 IIUC the integration is possible.

You can check binary-heap-plus linked above as an example. And the allocator-api show a way for evolution the generic of existing collections. HashMap also has an S type parameter that can be an analogy to BTreeMap's to-be-added Compare type parameter.

tisonkun avatar Jan 26 '25 17:01 tisonkun

finally a way to compare things in standard algorithms where you need extra state outside of the items you're comparing!

programmerjake avatar Jan 26 '25 18:01 programmerjake

I think also having a CompareEq trait would be useful where you don't have ordering, but just eq/ne, like PartialEq/Eq but with state.

programmerjake avatar Jan 26 '25 18:01 programmerjake

@programmerjake Yeah. Then if we follow the Hash/Hasher/BuildeHasher analogy, we may have:

  • Compare/BuildCompare
  • PartialCompare/BuildPartialCompare
  • CompareEq/BuildCompareEq
  • ~PartialEq~ (UPDATE: I noticed that Eq and PartialEq have the same method and signature)

I'd like to confirm that you're agree on the general direction, and ask if you have better names suggested.

tisonkun avatar Jan 28 '25 13:01 tisonkun

@programmerjake Yeah. Then if we follow the Hash/Hasher/BuildeHasher analogy, we may have:

  • Compare/BuildCompare
  • PartialCompare/BuildPartialCompare
  • CompareEq/BuildCompareEq
  • ~PartialEq~ (UPDATE: I noticed that Eq and PartialEq have the same method and signature)

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

I don't think we need the Build* traits -- hasher needs them because each hasher impl almost always doesn't know which types it's being called on so needs to accumulate state over multiple calls and finally return a hash, whereas Compare* doesn't have that problem since it's only called once for each comparison and it knows which types it's comparing. Default is sufficient for creating a new Compare* instance when you don't want to pass a custom-constructed one in.

programmerjake avatar Jan 28 '25 14:01 programmerjake

If we add CustomEq we'd also need externalized hashing because a Map that uses custom key comparison can also need custom key hashing.

the8472 avatar Jan 28 '25 14:01 the8472

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

Well. Then maybe for naming:

  • Compare
  • ComparePartial
  • CompareEq
  • ComparePartialEq

Default is sufficient for creating a new Compare* instance when you don't want to pass a custom-constructed one in.

Yes. This is the trait bound I currently use. I saw your comment mention "with state" so wonder whether we need a builder. And Default should be a optional bound if we'd support with_comparator(c) to pass a struct.

because a Map that uses custom key comparison can also need custom key hashing

I think this is dedicated to HashMap that requires K: Eq + Hash? What do you mean by "externalized hashing" in the context of comparator (this issue)?

tisonkun avatar Jan 28 '25 15:01 tisonkun

they have significant semantic differences, so I think we need both ComparePartialEq and CompareEq.

Well. Then maybe for naming:

  • Compare
  • ComparePartial
  • CompareEq
  • ComparePartialEq

looks good!

because a Map that uses custom key comparison can also need custom key hashing

I think this is dedicated to HashMap that requires K: Eq + Hash? What do you mean by "externalized hashing" in the context of comparator (this issue)?

Imagine you have a HashSet<String> except instead of using PartialEq, you supply CaseInsensitive: CompareEq, now you have a problem because "a" and "A" have different hashes but compare equal. having something like CompareEq but for hashing fixes that problem, so you'd have CaseInsensitive: BikeshedHash + CompareEq in HashSet<String, CaseInsensitive>

programmerjake avatar Jan 28 '25 15:01 programmerjake

since hashes don't easily combine and it's usually better to pass all input parts into a hashing algorithm and extract a hash at the end, maybe have an API like so:

pub trait BikeshedHash<T: ?Sized> {
    type Hasher<'a>: 'a
    where
        Self: 'a;
    fn build_hasher(&self) -> Self::Hasher<'_>;
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &T);
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64;
}

this allows recursive calls of hash, e.g.:

struct StringPair(String, String);
struct CaseInsensitive(DefaultHasher);
impl BikeshedHash<StringPair> for CaseInsensitive {
    type Hasher<'a> = RandomState;
    fn build_hasher(&self) -> Self::Hasher<'_> {
        self.0.build_hasher()
    }
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &StringPair) {
        self.hash(hasher, &value.0);
        self.hash(hasher, &value.1);
    }
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64 {
        hasher.finish()
    }
}

impl BikeshedHash<String> for CaseInsensitive {
    type Hasher<'a> = RandomState;
    fn build_hasher(&self) -> Self::Hasher<'_> {
        self.0.build_hasher()
    }
    fn hash<'a>(&'a self, hasher: &mut Self::Hasher<'a>, value: &String) {
        value.len().hash(hasher);
        for b in value.bytes() {
            let b = b.to_ascii_uppercase(); // I'm lazy, so skip unicode
            b.hash(hasher);
        }
    }
    fn finish<'a>(&'a self, hasher: Self::Hasher<'a>) -> u64 {
        hasher.finish()
    }
}

programmerjake avatar Jan 28 '25 15:01 programmerjake

An alternative approach would be a separate type along the lines of hashbrown's HashTable which gives you full control over hashing and equality comparisons. An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection but can instead be passed in by reference only when it is needed. This allows it to be shared between multiple collections.

Amanieu avatar Jan 28 '25 19:01 Amanieu

well, if only the completely external to the container approach were in std for all the other things that use comparison, e.g. BTreeMap/Set, BinaryHeap, etc.

programmerjake avatar Jan 28 '25 19:01 programmerjake

I've updated the description "Solution sketch" for adding four traits (analogy to PartialEq/Eq/PartialOrd/Ord).

For collections like BinaryHeap and BTreeMap, we can add a new state field and its type parameter, revoke the trait bound of key in insert/remove methods, and update the constructor with comparator (default to natural order + K: Ord).

An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection

I wonder how we pass the customized Compare logic if we don't introduce a Compare trait. Or it's an implementation method with the Compare trait.

a problem because "a" and "A" have different hashes but compare equal

For user-defined type, we already require well-defined implementation. That said, if we wrap str in a new type and implement hash unaligned with equal, it would behave wrongly.

tisonkun avatar Jan 31 '25 15:01 tisonkun

For HashMap, I may need some time to work a clear solution. But it seems the internal HashMap(HashTable) can have a comparator as state to wrap the equivalent implementation?

tisonkun avatar Jan 31 '25 15:01 tisonkun

An alternative approach would be a separate type along the lines of hashbrown's HashTable which gives you full control over hashing and equality comparisons. An advantage of this approach over a Compare-like trait is that the auxiliary data needed for comparisons doesn't need to be fully inside the collection but can instead be passed in by reference only when it is needed. This allows it to be shared between multiple collections.

@Amanieu I looked into this solution today and noticed that we don't expose HashTable in the std lib (https://github.com/apache/arrow-rs/pull/6035#issuecomment-2629743737). Is there a plan to do so?

Anyway, having such a solution with the current status to HashSet/HashMap may unblock the concern "whether the design is compatible".

tisonkun avatar Feb 03 '25 02:02 tisonkun

@Amanieu I looked into this solution today and noticed that we don't expose HashTable in the std lib (https://github.com/apache/arrow-rs/pull/6035#issuecomment-2629743737). Is there a plan to do so?

Not at the moment: the recommendation is for people who need HashTable to use it directly from the hashbrown crate. This is fine since HashMap/HashSet are sufficient for most use cases.

Amanieu avatar Feb 04 '25 19:02 Amanieu

We discussed this during today's libs-API meeting. We're sympathetic to making custom comparisons more ergonomic, but to make a decision we need sketches of the potential broader API designs (including alternatives and how ergonomic they are) and how this would fit into collections and other APIs that use comparisons.

A lot of the discussion in the meeting echoed comments from this thread. If we introduce a new trait in std it should also be usable by and with other std types and APIs. And a major potential user of that would be collections, sorting and iterators or collection contents. For collections then there is a tradeoff is whether it's important for ergonomics to include the Compare in collections themselves or whether lower-level collection interfaces that take an externally supplied Compare on each operation to reduce the memory footprint when there are many instances and Compare isn't zero-sized.

Compare would likely also need need convenience APIs similar to the various *_by_key methods we have and which binary-heap-plus uses too. Or combinators like then from the compare crate. Such things also should be part of an API outline.


Now writing this comment I notice we didn't actually spend much time on discussing your motivation.

You say

You can imagine if we have to wrap all the types here with Reverse or similar approaches, it would be quite challenging to forward all the traits and make the type system happy.

Bringing Compare and PartialCompare into the core lib would help in this case, where the compare crate is linked above, and the PartialCompare implementation is inlined above.

Can you explain how just having the traits in core without further integration into the rest of the standard library would improve the problems you're facing?

the8472 avatar Feb 04 '25 23:02 the8472

Can you explain how just having the traits in core without further integration into the rest of the standard library would improve the problems you're facing?

You can check the do_register_extremum example above and see that a standard Compare trait is good enough to improve my code without integration with other collections.

My consideration is that, if a Comapre trait is suitable in the core lib while other collection would benefit from it, we can first introduce the trait so that the following improvement can happen, instead of pushing all the design in the very beginning. While, I agree that it's important to ensure we don't go into a one-way door that introduce a bad abstration.

I notice equivalent and the discussion about externalized hashing above. So perhaps we'd start from a third-party crate? Or what is a minimal API outline should look like?

tisonkun avatar Mar 22 '25 13:03 tisonkun

My consideration is that, if a Comapre trait is suitable in the core lib while other collection would benefit from it, we can first introduce the trait so that the following improvement can happen, instead of pushing all the design in the very beginning. While, I agree that it's important to ensure we don't go into a one-way door that introduce a bad abstration.

Backlink https://github.com/rust-lang/rust/issues/64295#issuecomment-3571325356 as a possible enhance target.

tisonkun avatar Nov 24 '25 15:11 tisonkun