rfcs
rfcs copied to clipboard
RFC: New range types for Edition 2024
Change the range operators a..b, a.., and a..=b to resolve to new types ops::range::Range, ops::range::RangeFrom, and ops::range::RangeInclusive in Edition 2024. These new types will not implement Iterator, instead implementing Copy and IntoIterator.
Closes #2848
I personally wouldn't call the old range types "legacy" ranges because, if the new ranges impl IntoIterator, then it makes sense for their IntoIter to be the old range types.
I'm huge fan of backward compatibility. As alternative I've propose to add additional types of ranges and suffixes to ranges.
Now: Range, RangeFrom, RangeTo, RangeFull, RangeInclusive, RangeToInclusive (all with Iterator).
We add also: IntoRange, IntoRangeFrom, IntoRangeTo, IntoRangeFull, IntoRangeInclusive, IntoRangeToInclusive (all with IntoIterator)
let range1 = 0..5;
// range1 : Range<usize>
let range2 = 0..#rit 5;
// range2 : Range<usize>
let range3 = 0..#rii 5;
// range3 : IntoRange<usize>
let range4 : Range<usize> = 0..5;
let range5 : IntoRange<usize> = 0..5;
UPD: where #rit and #rii are "type" suffixes - Range with ITerator (#rit) and Range with IntoIterator (#rii). And default type is #rit
@vitww what do #rit and #rii stand for?
I personally wouldn't call the old range types "legacy" ranges because, if the new ranges impl
IntoIterator, then it makes sense for theirIntoIterto be the old range types.
This is addressed in the RFC under Use legacy range types as the iterators for the new range types.
I like this. I see one problem, however. Any crate which provides a container that can be sliced which is not updated will require &container[(a..b).into()].
This could be solved with impl<T: Index<LegacyRange>> Index for T, or by adding an implicit conversion. I think such is necessary in this case because, even if the user knows what's going on, .into() on indices is really ugly.
Maybe it's also worth a mention that--without some sort of blanket impl--all of stdlib is going to need to implement Index/IndexMut for both.
I like this. I see one problem, however. Any crate which provides a container that can be sliced which is not updated will require
&container[(a..b).into()].This could be solved with
impl<T: Index<LegacyRange>> Index for T, or by adding an implicit conversion.
I think it would have to be an implicit conversion since impl<T: ?Sized + Index<LegacyRange>> Index<Range> for T overlaps with existing impl<T: Index<I>, I> Index<I> for Wrapper<T>
What is Wrapper<T> in this context? Is that a reference to all containers, a specific internal type, or a public type I somehow don't know about?
What is
Wrapper<T>in this context? Is that a reference to all containers, a specific internal type, or a public type I somehow don't know about?
just some user-defined type with a type parameter T, e.g.:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3b971b6ac64759fe8c48d874d5442579
pub struct Wrapper<T>(T);
impl<T: Index<I>, I> Index<I> for Wrapper<T> {
type Output = T::Output;
fn index(&self, idx: I) -> &Self::Output {
self.0.index(idx)
}
}
@programmerjake Lol yeah. I think the best part of this discussion is that I wrote one of those this week. You're right though.
Could the compiler use the fact that it is the compiler to use specialization here? I dunno what the state of that is these days but it might be better than an implicit conversion. I hate blanket impls in stdlib, but it's a close race with how much I hate implicit conversions.
@ahicks92 thanks for bringing this up, I hadn't considered the indexing case.
I think many common cases will be covered by SliceIndex, but there will be collections that will need updates to support the new range types.
I will update the migration and drawbacks sections to cover this.
I would not support adding a blanket impl, specialization, or implicit conversion even if they were possible. I think we should take a different approach:
- stabilize the new types as soon as possible
- allowing crates to add Index impls before the syntax switch
Well, I do a lot of mathematical code as it is, and even as Rust is today Rust is very verbose a lot of the time for such cases. That's not really a complaint about Rust as such, But I don't want to deal with:
c[(a..b).into()].copy_from_slice(c2[(a..b).into()])
Where c is some collection. I can't find a crate right now but ironically the example that came to mind is ndarray (which sadly uses a different type for slicing and thus isn't applicable) which made me realize that this is actually more general for the case wherein one might be trying to perform operations with ranges where the thing going on is a method call.
The worst practical case of slicing with this proposal would be &c[(a as usize..b as usize).into()], because often one must cast the endpoints of the range.
crates.io is pretty big and pretty full of unmaintained stuff. I think "let crates start migrating ASAP" is a good point but it doesn't answer what to do about anything older that's not maintained or anything that doesn't notice this is coming in time. I don't know how much of a problem that would be, and I'm not sure how one would quantify it either. But that might be a good thing to try to do if it can be done.
I will just mention that (a..b).into() probably won't work unless the only index type is Range - otherwise it will probably fail with a type inference error. We may want to consider adding an inherent method (to_legacy or something) to avoid that issue.
I can't find a crate right now but ironically the example that came to mind is ndarray (which sadly uses a different type for slicing and thus isn't applicable)
In std, the only affected impls would be:
impl ops::Index<ops::Range<usize>> for String(and the respectiveIndexMut)impl ops::Index<ops::RangeFrom<usize>> for String(and the respectiveIndexMut)impl ops::Index<ops::RangeInclusive<usize>> for String(and the respectiveIndexMut)impl ops::Index<ops::RangeFrom<usize>> for CStr
The Index impls for String should probably transition to using SliceIndex like str does. BTreeMap, HashMap, and VecDeque don't support indexing by range at all, and CString and OsString only support RangeFull. BTreeMap::range uses RangeBounds, so no issue there.
So I'd say the only serious one there is CStr, which doesn't seem too bad. I am curious to see how many 3rd-party collections will be affected. I'm not sure how to go about researching that.
I am thinking about it like this:
- This RFC wants to replace range types in large part because the new ones can be Copy, thus avoiding
.clone(). - But to do so, it instead exchanges
.clone()for.into().
I understand that the other motivation is about having ranges inside other types, but leaving that aside the unstated premise here seems to me to be that we are claiming that working around lack of Copy today is more ubiquitous than needing to convert to legacy types in Rust 2024. I could believe that is true, my use cases aren't even close to all possible use cases, but not without proof.
I don't consider stdlib to be interesting. From a language perspective solutions up to and including hacking the compiler for stdlib backward compatibility are feasible. Solving it is important and interesting but the solution space is large enough there that there is certainly a solution and the question isn't "if", it's "which".
My thoughts:
- We need actual data about this. Right now we don't know if this will be an actual issue in practice. The evaluation of
stdwas only an example. - If the issue is widespread, then I agree we should consider doing something more involved.
- On the other hand, updating to the new edition is entirely optional. You and others can choose to stay on 2021 if you feel conversion is overly burdensome.
I don't think we can. One either picks 2024 and converts for 2021 crates, or picks 2021 and converts for 2024 crates. Right? So if this is an issue you lose out either way. Also stay on the old edition doesn't work because eventually the new edition gets something you need.
But as you've already stated data is a good idea. I'm willing to argue contingent on this being widespread all day long but the entire thing depends on if it's widespread or not.
If we don't have one someone should really figure out how to run a query over all of crates.io, probably via docs.rs, to be like select * where a_range_type in signature.
I don't think we can. One either picks 2024 and converts for 2021 crates, or picks 2021 and converts for 2024 crates. Right?
Not really. For the indexing issue, they already support legacy range types and they can just add impls for the new types. Functions that take a specific range type can use impl Into<Range<_>> to support all editions, with no need for conversion on usage.
The only case where conversion is unavoidable is if you're using the new range types on code expecting old range types. All new code can and should be written to accept both.
Essentially, the issue is forwards compatibility, not backwards compatibility.
I doubt anyone is going to cover both. That's not a big ask but you then have to teach newbies to the language that due to legacy reasons they'd better.
As far as I know, every other edition has not required supporting the older edition in order to depend on crates supporting the newer one. Perhaps it has and I've missed it. But if not, this is establishing the precedent.
I doubt anyone is going to cover both. That's not a big ask but you then have to teach newbies to the language that due to legacy reasons they'd better.
We're not talking about newbies here but library authors who have decided to upgrade their crate to a new edition. Why wouldn't they choose to support both?
Either these are (1) existing crates being updated to support edition 2024 or (2) new crates written after edition 2024. This is less likely for (1) as they only need to add new impls for the new types, and would need to actively remove support for the old types. (2) are more likely to lack support for the older types, just by accident.
For (1) you can always choose not to update, but for (2) you have fewer options. But in both cases, it's highly likely that they will accept contributions or make updates on request to support the legacy types, given they are being actively developed / maintained.
As far as I know, every other edition has not required supporting the older edition in order to depend on crates supporting the newer one.
This doesn't change that. We're not talking about these crates being unable to interact. We're just talking about some explicit conversions being necessary, or preferably that the library explicitly supports both legacy and new ranges.
All that said, it would be a good idea to add a clippy lint to warn when public APIs are restricted to only new or only legacy range types.
Awfully disruptive change here. We do indexing & iterating all the time, so even mild hiccups create chaos. It's conversely relatively rare range types appear inside a Copy type.
We'll have similar "Copy vs Iterator" problems elsewhere outside std. We'd avoid some chaos if we address the two roles of Copy directly, for which multiple options exist:
First, we could change Copy itself to make implicit copies optional, while still promissing bitwize copies.
pub trait Copy {
const IMPLICIT_COPY: bool = true;
}
#[derive(Clone,Copy)]
pub struct Cpy<T: Copy>(pub T);
At this point all range types become Copy<IMPLICIT_COPY=false>, so no implicit copies, but Cpy(range_expr) provides implicit copies.
Second, we could provide another unsafe impl Copy mechanism to promise soundness, with which we provide some wrapper type that permits copies.
pub unsafe trait CopySafe: Clone {}
// unsafe impl CopySafe for each range type
#[derive(Clone,...)]
pub struct Cpy<T: CopySafe>(pub T);
unsafe impl<T: CopySafe> Copy for Cpy<T> {}
Again Cpy(range_expr) provides implicit copies, but you'd now need it in structs, not required by the associated bool.
The "add iterator methods to range" thing here seems like it solves the iterating churn. In my experience you either iterate a range in a for loop (handled by IntoIterator) or use the iterator methods (handled by making them inherent). This RFC seems like it mostly leaves that sort of code alone.
IntoIterator is often either not used or unusable.
fn foo<I: Iterator<Item=usize>>(i: &mut I)
fn bar<I>(i: &mut I)
where for<'a> &'a mut I: Iterator<Item=usize>
It'll require a major version change for rand, so then incompatible traits mess. etc.
We should permit T: Copy where T contains a non-Copy type whose clone works like memcpy, but we'll have better results if we do this directly, rather than by altering .. syntax.
IntoIteratoris often either not used or unusable.fn foo<I: Iterator<Item=usize>>(i: &mut I) fn bar<I>(i: &mut I) where for<'a> &'a mut I: Iterator<Item=usize>
For functions like this, you can just call .into_iter() at the usage site:
foo(&mut (0..5).into_iter());
bar(&mut (0..5).into_iter());
playground
It should be possible to implement this transformation as part of cargo fix --edition.
It'll require a major version change for rand, so then incompatible traits mess. etc.
Can you specify exactly where in rand you're referring to?
We need actual data about this. Right now we don't know if this will be an actual issue in practice. The evaluation of
stdwas only an example.
As a start:
Github search for Index<Range<_>> and friends yields 688 files, almost all of which appear to be true matches (at least on the first 5 pages). No idea how many of these are published library crates.
isn't it quite uncommon to have functions that take &mut Iter though? Typically you'd call .by_ref() at call site instead.
But to do so, it instead exchanges .clone() for .into().
A big difference to me is that ranges not implementing Copy also affects types that contain a range, making it viral. For a good number of libraries, ranges are a fundamental type and the current design pushes them to implement their own versions.
It occurs to me that a "this can secretly be copy" marker type which wraps an inner type to make it Copy could be done today without further help from the language. I believe UnsafeCell would allow for it, but if not then there are other tricks like aligned u8 arrays and pointer casts. If ranges guaranteed that they can be Copy, but aren't for legacy reasons, this could be even delegated to a library. By not implementing Iterator on such a wrapper, and instead implementing IntoIterator, wouldn't that solve this without much help from the language other than a trivial layout guarantee that's upheld anyway?
I guess "use a library" isn't ideal but at the same time neither is this, and implementing Deref and DerefMut would handle a lot of the code churn for those wishing to adopt. If nothing else the churn is limited to internals of crates, not the public API boundary.
Copy is a good motivation. Fixing for i in x..=y performance is I would argue an even better one.
The performance of the current RangeInclusive when used as an external iterator is bonkers. This leads to inquiries such as:
- A Call for Proposals for 2024
- Why is iterator so much faster?
- Why isn't the for loop optimized better (in this one example)?
- Why does iteration over an inclusive range generate longer assembly in Rust than in C++?
Where the only response so far is: "Oh yeah, sorry, RangeInclusive doesn't optimize well, better switch to Range" which is really NOT a response I like to give in Rust.
And a bunch of issues that led to nowhere, such as:
- https://github.com/rust-lang/rust/issues/45222
- https://github.com/rust-lang/rust/issues/75035
A lot of people tried their hands at fixing this performance issue and by now I think the only reasonable conclusion is that it's very unlikely that after years of effort we'll finally figure out how to tame it. Other than by separating RangeInclusive and its iterator type, of course.
The cause of the slow-down is the mutation of the flag within the loop: a well-known optimization issue with chain-like iterators. The iterator of RangeInclusive should attempt to avoid such a mutation as much as possible, which can be done at least for all non-full ranges.
It's notable that as long as end can be incremented, or start can be decremented, it's relatively trivial to switch to a semi-exclusive range, instead of an inclusive range, when it comes to iterating:
enum RangeInclusiveIter<T> {
// Contains a..(b+1) from a..=b.
Exclusive { start: T, end: T },
// Contains (a-1)..b from a..=b, requires +1 at each step.
Shifted { start: T, end: T },
// Contains (a, b) from a..=b.
Full { start: T, end: T, exhausted: bool },
}
Of course, code-generation tests should be performed -- possibly reusing the examples provided in the above links -- to see that code generation actually benefits from those shenanigans.
@matthieu-m I mention that enum-based approach in Future possibilities, and even provide a playground demo.
Another option for fixing the performance is to use the next larger integer type (on just one of the fields so same size). But that would require adding an associated type to Step (or introducing a new trait for that) so that each Implementor can specify a separate iterator type.
There isn't always a next (native) integer type. While u128 compiles on all platforms, it does not necessarily compile efficiently, and that still would leave open the question of what one does if the user wants to iterate over u128 ranges. In order to support that, it'd be necessary to carve out at least one implementation from an otherwise generic scenario. RPITIT may solve this to at least some extent but I think that requires named existential types.
Enum-based approaches likely do not optimize well in many cases because they rely on having at least tons of inlining to see through the branches. I would presume other methods such as map and such could be optimized today without major type changes.
A "fix" here would be simply adding a method to RangeInclusive which tries to convert it to a normal boring Range, e.g. fn to_exclusive(&self)->Option<Range<T>> etc. Again, I don't think such a solution would need a new type.