libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

ACP: `TwoSidedRange` trait

Open pitaj opened this issue 1 year ago • 44 comments

Proposal

Problem statement

RangeBounds exists as a useful generic abstraction that lets users write functions that take any range type. However, is not uncommon that users want to only accept ranges with both start and end bounds (Range and RangeInclusive). If they choose to use RangeBounds for this, they have to resort to runtime panics. Or they have to write their own trait only implemented for the given ranges.

Motivating examples or use cases

The rand crate has a SampleRange trait implemented only for Range and RangeInclusive.

If TwoSidedRange existed, they could use that instead.

TODO: Other examples

Solution sketch

pub trait TwoSidedRange<T>: RangeBounds<T> {
    // Should this unconditionally clone to match `end_*`?
    /// Returns the incisive starting bound of the range
    fn start_inclusive(&self) -> &T;

    /// `None` if range is empty, otherwise returns the inclusive end bound 
    fn end_inclusive(&self) -> Option<T>;

    /// `None` if end bound would overflow `T`, otherwise returns the exclusive end bound
    // Maybe return `Result` instead?
    fn end_exclusive(&self) -> Option<T>;

    /// Returns the number of steps covered by the range
    fn width(&self) -> Option<usize>;
}

impl<T: Step> TwoSidedRange<T> for Range<T> { ... }
impl<T: Step> TwoSidedRange<T> for RangeInclusive<T> { ... }

Alternatives

The design of width (airways returning usize) is not ideal, we could instead return the next larger unsigned integer or something instead, but that would require a special trait.

The functions could live on rangebounds instead for better visibility.

Could instead have two traits: StartBoundRange and EndBoundRange and then TwoSidedRange would be expressed with StartBoundRange + EndBoundRange but providing fn width would be tough.

width is similar to ExactSizeIterator::len but implemented fallibly, whereas .len() can silently return incorrect results on platforms where usize is 16 bits.

Links and related work

  • Brought up on RFC 3550: https://github.com/rust-lang/rfcs/pull/3550#issuecomment-2032641282
  • Discussed on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Abstracting.20over.20ranges.2C.20.60RangeBounds.60.20alternatives

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

pitaj avatar Apr 05 '24 16:04 pitaj

sorry gotta -1 this one if this is the only motivation:

The rand crate has a SampleRange trait implemented only for Range and RangeInclusive.

I don't find this motivating enough. While the SampleRange trait is implemented over Range and RangeInclusive only, the definition of the trait cannot benefit from the proposed TwoSidedRange at all.

pub trait SampleRange<T> {
    fn sample_single<R: RngCore + ?Sized>(self, rng: &mut R) -> T;  // <- certainly out of scope for a std TwoSidedRange
    fn is_empty(&self) -> bool;
}

furthermore, since T is not required to be Copy, it can't even be properly implemented relying on the proposed solution sketch

impl<B, T> SampleRange<T> for B
where
    T: SampleUniform + PartialOrd,
    B: TwoSidedRange<T>,
{
    fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
        let start = self.start_inclusive();
        if let Some(end) = self.end_inclusive() {
            T::Sampler::sample_single_inclusive(*start, *end, rng) // actually can't move out start and end
        } else if let Some(end) = self.end_exclusive() {
            T::Sampler::sample_single(*start, *end, rng) // ditto
        } else {
            panic!("the range has no inclusive nor exclusive end why oh why");
        }
    }
    fn is_empty(&self) -> bool {
        self.width() <= Some(0)
    }
}

The rand they should just implement SampleRange over all 4 new and old + inclusive and exclusive range types and done (rand can't use any sort of TwoSidedRange without breaking MSRV anyway)

kennytm avatar Apr 05 '24 17:04 kennytm

For completeness sake, I'm posting the original situation I found RangeBounds lacking for:

/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`.
fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>

Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like width seems like a good idea to me.

Another situation where I found the RangeBounds trait lacking, was implementing an interpreter for a DSL. The DSL has ranges, and initially I thought that mapping them to Rust ranges in the AST would be a good fit. That didn't work out in the end for other reasons, but even if they would map semantically, once you have a R: RangeBounds<T> how would you implement a for loop for it with the current trait? Yes one can write the appropriate match statements, but that seems needlessly complicated. I'd argue that one of the fundamental properties of ranges are the width they define, so there ought to be an easy-to-use way to get that value.

Voultapher avatar Apr 05 '24 18:04 Voultapher

I will add more examples when I get home (submitted this on my phone).

If you read the comments, I called out explicitly whether start_inclusive should return by value. If it does, your code works practically without change:

impl<B, T> SampleRange<T> for B
where
    T: SampleUniform + PartialOrd,
    B: TwoEndedRange<T>,
{
    fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
        let start = self.start_inclusive();
        // Exclusive end bound can always convert
        // to inclusive, unless the range is empty
        // But we explicitly don't support empty
        let end = self.end_inclusive().expect("range was empty");
        T::Sampler::sample_single_inclusive(start, end, rng)
    }
    fn is_empty(&self) -> bool {
        RangeBounds::is_empty(self)
    }
}

pitaj avatar Apr 05 '24 18:04 pitaj

The new sample code doesn't work because if start_inclusive() returned by value it would need to consume self, making self.end_inclusive() failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.

trait TwoSidedRange<T> {
    fn try_into_range(self) -> Result<Range<T>, Self>;
    fn try_into_range_inclusive(self) -> Result<RangeInclusive<T>, Self>;
}

kennytm avatar Apr 05 '24 18:04 kennytm

The new sample code doesn't work because if start_inclusive() returned by value it would need to consume self, making self.end_inclusive() failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.

No, you can just clone the bound. Step implies Clone, so the impl for TwoSidedRange can look like this:

impl<T: Step> TwoSidedRange<T> for Range<T> {
    // Just a copy for every `Step` type at the moment
    fn start_inclusive(&self) { self.start.clone() }
    ...
}

pitaj avatar Apr 05 '24 19:04 pitaj

The original implementation of sample_single does not require a clone, only moving. Consider that T can be a BigInt. This is an unnecessary performance degradation only because of limitation of the trait.

Edit: Also, rand::SampleRange did not require the T: Step bound, only T: SampleUniform + PartialOrd, changing the requirement to T: Step is totally unacceptable.

(P.S. I think RangeBounds should introduce an fn into_bounds(self) -> (Bound<T>, Bound<T>) as well, but this is out-of-scope.)

kennytm avatar Apr 05 '24 19:04 kennytm

@Voultapher

For completeness sake, I'm posting the original situation I found RangeBounds lacking for:

/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`.
fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>

Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like width seems like a good idea to me.

Yeah a .width() would be useful here because you're constraining to ranges of i32.

But rand's SampleUniform is also implemented for SIMD types (u32x2 etc) and floating point (f64) which a fn width(&self) -> Option<usize> does not seem helpful.

kennytm avatar Apr 05 '24 19:04 kennytm

@kennytm unless I'm missing something, even if the proposed API extension doesn't help with rand's SampleUniform that wouldn't exclude it from being helpful in other use-cases as outlined, right?

Voultapher avatar Apr 05 '24 19:04 Voultapher

Indeed, the sketch solution doesn't resolve the rand case because those other types don't implement Step.

A more general solution would be something like

struct BoundIE<T> {
    Included(T),
    Excluded(T),
}
fn into_bounds(self) -> (T, BoundIE<T>);

// Or with pattern types

fn into_bounds(self) -> (T, Bound<T> is Bound::Included(_) | Bound::Excluded(_));

Which wouldn't require Step.

pitaj avatar Apr 05 '24 19:04 pitaj

@Voultapher That's why the first sentence of my comment is "gotta -1 this one (rand::SampleRange) if this is the only motivation"

So far the other motivation is your DSL use case in https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040436159.

kennytm avatar Apr 05 '24 19:04 kennytm

@kennytm I provided two motivating examples, random_uniform and the DSL.

Voultapher avatar Apr 06 '24 06:04 Voultapher

@Voultapher why do you want to implement your own random_uniform, that is valid only for i32, when rand::distributions::Uniform existed?

kennytm avatar Apr 06 '24 08:04 kennytm

@kennytm while I could explain why that signature makes sense in my use-case, I don't see how that's relevant here. And looking at rand::distributions::Uniform it implements From for Range and RangeInclusive individually, it could instead with this proposal implement it for TwoSidedRange and save on duplication. I took 5 minutes and searched for .start_bound() via grep.app and found a variety of places that would benefit from the proposed API:

  • Rust stdlib https://github.com/rust-lang/rust/blob/8d490e33ad7bbcdeab7975be787653b1e46e48e4/library/core/src/slice/index.rs#L702
  • luminal https://github.com/jafioti/luminal/blob/239af0df26ef973df0fa0df28092670709d328e1/src/shape/slice.rs#L4
  • clap https://github.com/clap-rs/clap/blob/9d14f394ba22f65f8957310a03ae5fd613f89d76/clap_builder/src/builder/value_parser.rs#L1365
  • Xline https://github.com/xline-kv/Xline/blob/e35b35abc4b1b8316041bc71dbe7fe44785fdae0/crates/xlineapi/src/command.rs#L76

And the list goes on. If most users of the API end up doing the same match, why not provide it as part of the stdlib?

Voultapher avatar Apr 06 '24 10:04 Voultapher

@Voultapher

  1. As I said above in https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040318248 and https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040456553 you can't implement Into<Uniform> using TwoSidedRange with the current sketch, please follow through the discussion first.
  2. for the given examples
    • Rust stdlib: It clips a impl RangeBounds<usize> by a RangeTo<usize> to produce a Range<usize>, you don't need TwoSidedRange here at all as RangeBounds is more general, with API allowing us to write range(.., ..6).
    • luminal: ditto
    • clap: ditto
    • Xline: this used Vec<u8> as bounds because it is a KV storage, it only rejected start_bound() with Excluded, but an end_bound() with Unbounded i.e. a range of the form b"key".. is accepted. Plus it is operating on a special KeyRange type not any of the std::ops types.

Have you actually looked into what your linked examples actually does? All of them showed counter-examples of TwoSidedRange, that the trait is not sufficient for their use case.

kennytm avatar Apr 06 '24 11:04 kennytm

Also, for the width function,

  1. As shown above we see multiple examples where Range is used with non-Step types, such as rand using it for floating point (f64 etc) and SIMD types (u32x2 etc), and @Voultapher's Xline example for key ranges (Vec<u8>) in KV storage. Width of these types are meaningless.

  2. Even within Step types, the width() is not able to represent the edge case 0 ..= usize::MAX, it can only return None. Basically, the width() cannot simultaneously distinguish the following 3 edge cases:

    • 0 .. 0, a valid empty range, should be Some(0)
    • 1 .. 0, an invalid range, maybe either None or Some(0)
    • 0 ..= usize::MAX, a range with width overflowed usize, currently must be None.

    If (0..0).width() == Some(0) && (1..0).width() == Some(0) && (0..=usize::MAX).width() == None, I don't see why you need to introduce width() at all since it is equivalent to .into_iter().size_hint().1.

    assert_eq!((0..0).into_iter().size_hint().1, Some(0));
    assert_eq!((1..0).into_iter().size_hint().1, Some(0));
    assert_eq!((0..=usize::MAX).into_iter().size_hint().1, None);
    
  3. Even if we relax width()'s return type to an associated type to support 128-bit integers:

    trait TwoSidedRange<T>: RangeBounds<T> {
        type Width;
        fn width(&self) -> Option<Self::Width>;
    }
    

    you cannot use any built-in type to properly get the width of 0 ..= u128::MAX.

kennytm avatar Apr 06 '24 11:04 kennytm

@kennytm maybe we are talking about different things here. I'm observing an ergonomic hole in the stdlib, that forces users to re-implement very similar logic again and again. You seem focused on showing that the current API sketch has its issues. Can we agree that the API could be more ergonomic? If yes, then let's think about the various use-cases and figure out how they could be improved.

Voultapher avatar Apr 06 '24 11:04 Voultapher

I'd also like to add that the type size limit for 0 ..=T::MAX affects the existing code, luminal seems to not handle it, clap saturates, which is likely never hit but still somewhat wrong and Xline doesn't care about the values. So whatever way we find to make that limitation more obvious or non-existant would probably benefit many users.

Voultapher avatar Apr 06 '24 11:04 Voultapher

@Voultapher

I don't see width() being able to solve any ergonomic hole, nor having a trait implemented for only Range and RangeInclusive. If these are not what you are proposing I think this ACP should better be moved back to the drawing board.

In your examples (let's ignore Xline):

  • rand - the hole is the ability to try_convert any bounded range into a RangeInclusive<T>. Won't remove any of the runtime check though.
  • std - no hole. https://doc.rust-lang.org/std/slice/fn.range.html, the function you linked, is one that exactly hides away the duplications to fill the ergonomic hole
  • luminal - is doing exactly what std::slice::range is doing but on Expression not usize.
  • clap - although written strangely it is also trying to constrain a range by another.

If there is a significant hole I'd say to expand std::slice::range to accept any T: Step instead of only usize.

kennytm avatar Apr 06 '24 12:04 kennytm

Specifically for finite integer ranges there's actually an exact bound one can use today: RangeBounds<_> + ExactSizeIterator. Not the most intuitive solution, of course, and with future non-iterator ranges the bound would get more complicated.

jdahlstrom avatar Apr 28 '24 18:04 jdahlstrom

ExactSizeIterator is not implemented for several integer ranges, including Range<u64>.

pitaj avatar Apr 28 '24 19:04 pitaj

I think many cases that would be helped by something like this probably just take an argument of Range (or RangeInclusive) directly, rather than accepting both.

Some examples:

The trait originally described would work for those cases, but something like this might be better. !Step cases (like SampleRange) can just use into_bounds:

pub enum BoundIE<T> {
    Included(T),
    Excluded(T),
}

pub trait TwoSidedRange<T>: RangeBounds<T> {
    // Should `into_bounds` just return `(T, BoundIE<T>)`
    // since the start bound is always inclusive?

    /// Get the bounds directly.
    fn into_bounds(self) -> (BoundIE<T>, BoundIE<T>);

    // `Step` implies `Clone` so technically `inclusive_bounds`
    // could take `&self` instead.

    /// Convert both bounds to inclusive,
    /// returns `None` for an empty range.
    fn inclusive_bounds(self) -> Option<(T, T)>
    where
        T: Step
    {
        match (self.start_bound(), self.end_bound()) {
            (Bound::Included(start), Bound::Included(end)) => Some((start, end)),
            (Bound::Included(start), Bound::Excluded(end)) => {
                if start >= end {
                    // Empty
                    None
                } else {
                    let end_inclusive = Step::backward(end, 1);
                    Some((start, end_inclusive))
                }
            }
            _ => unreachable!(),
        }
    }
}

impl<T> TwoSidedRange<T> for Range<T> { ... }
impl<T> TwoSidedRange<T> for RangeInclusive<T> { ... }

pitaj avatar Apr 30 '24 15:04 pitaj

  • wgpu::util::RenderEncoder::draw*Range<u32>
    • Changing the Range argument to any generic type will cause RenderEncoder no longer being object-safe.
    • If we disregard object-safety, since the two ranges are referring to the indices of the vertex and index buffers, it seems more natural to mirror the slice API and use impl RangeBounds<u32> or even SliceIndex.
  • object::read::Read*::read_bytes_at_untilRange<u64>
    • Yes the documentation does require a bounded range. So it should be somewhat beneficial.
  • wasmer::MemoryView::copy_range_to_vecRange<u64>
    • MemoryView is like an &[u8] living in WASM so again IMO it's more natural to generalize to impl RangeBounds<u64>
  • similar::capture_diffRange<usize>
    • No comment, not sure why it needs to take a range at all.
    • I don't know if it specifically need a Range<usize>, or can accept both Range<usize> | RangeInclusive<usize>, or can accept impl RangeBounds<usize>, or can accept impl SliceIndex<Old/New>.

Generally,

  • If the API is mainly used internally, there is no benefit making it accept both Range and RangeInclusive, just stick to whatever type the code is already using.

  • If the API is public and operating on a slice-like object (i.e. has a length and min index is 0), I think a better course would be accepting all 6 kinds of ranges

    1. std should add a default method into RangeBounds (similar to contains())

      trait RangeBounds<T> {
          fn into_bounds(self) -> (Bounds<T>, Bounds<T>) where Self: Sized;
          fn clamp_index_range(self, bounds: Range<T>) -> Range<T> where Self: Sized, T: Step { .. }
      }
      
    2. the user API generalize to accept RangeBounds<uXX>

    3. the user code calls let input = input.clamp_index_range(0..self.len()) and continue using their old code.

    Think of it like generalizing the API surface from taking X to impl Into<X>.

  • I don't think other cases benefit much from just accepting RangeInclusive in addition to Range.

kennytm avatar Apr 30 '24 19:04 kennytm

std should add a default method into RangeBounds

Unfortunately, this may not be possible because of existing impls like this one:

impl<T> RangeBounds<T> for Range<&T> { ... }

(I was thinking about exactly that same thing earlier today)

At the very least I think you would need where T: Clone which kinda defeats the purpose.

pitaj avatar Apr 30 '24 20:04 pitaj

the user API generalize to accept RangeBounds<uXX>

People find RangeBounds annoying to use. This ACP grew out of discussion about how to improve RangeBounds or otherwise make dealing with ranges generically easier.

I will also just note that the unstable OneSidedRange trait already exists, which arguably has even more dubious applications.

pitaj avatar Apr 30 '24 21:04 pitaj

Going back to my original use-case of random_uniform there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they? And slice-like interfaces can make sense, imagine abstracting over a Vec and VecDeque.

Voultapher avatar May 01 '24 07:05 Voultapher

Unfortunately, this may not be possible because of existing impls like this one:

impl<T> RangeBounds<T> for Range<&T> { ... }

At the very least I think you would need where T: Clone which kinda defeats the purpose.

This could be hacked by constraining on the &T. https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=121a9cc62b14f6a0578a105261b1ad38

trait RangeBounds<T: ?Sized> {
    // Hack(?) to make `.into_bounds()` work without requiring `T: Clone`.
    type Interior
    where
        Self: Sized; // can omit this `where` clause if we don't care about object-safety.

    fn into_bounds(self) -> (Bound<T>, Bound<T>)
    where
        Self: Sized,
        T: Sized,
        Self::Interior: Into<T>;
}

(Because RangeBounds's contain() method already prevented it to be object-safe, adding the where Self: Sized; to the associated type is not really necessary.)

People find RangeBounds annoying to use. This ACP grew out of discussion about how to improve RangeBounds or otherwise make dealing with ranges generically easier.

In that case we should be improving RangeBounds itself rather than creating a new trait?

I will also just note that the unstable OneSidedRange trait already exists, which arguably has even more dubious applications.

OneSidedRange was introduced in rust-lang/rust#69780 which was introduced to support rust-lang/rust#62280 i.e. slice::take. Unlike TwoSidedRange introduced here, the OneSidedRange contains no methods other than those inherited from RangeBounds.

As explained in the tracking issue of slice::take,

  1. the API can easily be generalized to ZeroSidedRange (i.e. RangeFull ..) so the name OneSidedRange is already too restricting for its primary purpose
  2. there is indeed objection of introducing this OneSidedRange trait is unnecessary.

I think .take(..a) does look better than .take_before(a), but it looks like a specialized-for-slice::take trait std::slice::TakeRange (similar to std::slice::SliceIndex) is more suitable than a dubiously general OneSidedRange trait.

kennytm avatar May 01 '24 10:05 kennytm

Going back to my original use-case of random_uniform there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they?

Conventionally a.. means "a to infinity", not "a to typeof(A)::MAX". There's a reason that SampleRange is only implemented for two-sided ranges.

Imagine that type inference were improved so we could use any integer type to index into slices. We would still expect slice[5_u8..] to capture every element position 5 and after. We wouldn't expect it to stop after 255.

pitaj avatar May 01 '24 14:05 pitaj

Another solution for the into_bounds problem that doesn't require an associated type, uses a new trait instead and guarantees move-only

pub trait RangeIntoBounds<T> {
    fn into_bounds(self) -> (Bound<T>, Bound<T>);
}

// impl for each range type where T: Sized

pub trait RangeBounds<T: ?Sized> {
    fn into_bounds(self) -> (Bound<T>, Bound<T>)
    where 
        Self: RangeIntoBounds,
    {
        RangeIntoBounds::into_bounds(self)
    }
}

Though it's hard to argue in that case that it should be on RangeBounds instead of users just importing RangeIntoBounds as well.

pitaj avatar May 01 '24 15:05 pitaj

Actually I think the associated type solution you propose would be a breaking change - RangeBounds is not a sealed trait. Even just adding a non-default method would be breaking.

pitaj avatar May 01 '24 15:05 pitaj

Actually I think the associated type solution you propose would be a breaking change - RangeBounds is not a sealed trait.

Yes indeed it is :smiling_face_with_tear:. And there are quite many third-party crates implementing RangeBounds so the impact is not ignorable either.

Technically the clamp_index_range() function does not require into_bounds(), because we already want T: Step and Step: Clone so we could just .clone() the two bounds :shrug: Works but ugly.

kennytm avatar May 01 '24 15:05 kennytm