libs-team
libs-team copied to clipboard
Add `iter::Peekable::next_unpeek`
Proposal
Problem statement
Working with iter::Peekable is often annoying because of two reasons:
- The peeked value is only a reference, which means one can’t make use of ownership, and sometimes ends up with a double reference.
- If the peeked value is considered good, one has to call
.next()and discard the value, which looks a little strange – worse, one will need tomatchon it to take ownership of the contents within, with the other branches being forced to panic.
Some of these problems are alleviated by the addition of .next_if – however, this isn’t applicable in all situations.
Motivating examples or use cases
One example is the following code, which consumes all Text elements from a pulldown_cmark::Parser:
loop {
match parser.peek() {
Some(pulldown_cmark::Event::Text(_)) => {
let pulldown_cmark::Event::Text(s) = parser.next().unwrap() else { unreachable!() };
do_something(s);
}
_ => break,
}
}
Solution sketch
impl<I: Iterator> Peekable<I> {
pub fn next_unpeek(&mut self) -> (Option<I::Item>, Unpeeker<'a, I>) {
(self.next(), Unpeeker(&mut self.peeked))
}
}
pub struct Unpeeker<'a, I: Iterator>(&'a mut Option<Option<I::Item>>);
impl<'a, I: Iterator> Unpeeker<'a, I> {
pub fn unpeek(self, item: Option<I::Item>) {
*self.0 = Some(item);
}
}
With this solution, the above code is made simpler:
loop {
match parser.next_unpeek() {
(Some(pulldown_cmark::Event::Text(s)), _) => do_something(s),
(other, unpeeker) => {
unpeeker.unpeek(other);
break;
}
}
}
Alternatives
- Instead of using a dedicated type
Unpeeker<'a, I>, use animpl FnOnce. The disadvantage ofimpl FnOnceis that borrowck can’t see that it has noDropimplementation, meaning that less code compiles (see this comment). - Provide direct access to the underlying
peekedfield, orfn unpeek(&mut self, peeked: Option<I::Item>). The disadvantage of this is that one is able to replace an existingpeekedvalue that is notNone, which is a footgun. Theunpeekmethod could panic in this case, but that’s also not great.
Links and related work
None.
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
I have discovered that the Unpeeker API is probably better, because it plays better with borrowck. In particular, see this example (playground):
fn main() {
let mut iter = Peekable::new([1, 2].into_iter());
// Fails!
// while let (Some(_), _) = iter.next_unpeek_v1() {
// iter.next_unpeek_v1();
// }
// Compiles:
while let (Some(_), _) = iter.next_unpeek_v2() {
iter.next_unpeek_v2();
}
}
struct Peekable<I: Iterator> {
peeked: Option<Option<I::Item>>,
iter: I,
}
impl<I: Iterator> Peekable<I> {
fn new(iter: I) -> Self {
Self { peeked: None, iter }
}
fn next_unpeek_v1(&mut self) -> (Option<I::Item>, impl FnOnce(Option<I::Item>) + use<'_, I>) {
let item = self.peeked.take().unwrap_or_else(|| self.iter.next());
(item, |item| {
self.peeked = Some(item);
})
}
fn next_unpeek_v2(&mut self) -> (Option<I::Item>, Unpeeker<'_, I>) {
let item = self.peeked.take().unwrap_or_else(|| self.iter.next());
(item, Unpeeker(&mut self.peeked))
}
}
struct Unpeeker<'a, I: Iterator>(&'a mut Option<Option<I::Item>>);
impl<'a, I: Iterator> Unpeeker<'a, I> {
fn unpeek(self, item: Option<I::Item>) {
*self.0 = Some(item);
}
}
It seems that borrowck gets very conservative around RPIT types, since it doesn’t know whether its Drop implementation is significant. Whereas by using an Unpeeker type, we can inform borrowck that the Drop is insignificant, allowing more code to compile.
The disadvantage of this is that one is able to replace an existing peeked value that is not
None, which is a footgun. Theunpeekmethod could panic in this case, but that’s also not great.
By panic you mean if self.peeked is not None, self.unpeek(x) will panic, or something else?
Because I don't see there is any different in capability between self.unpeek(x) and self.next_unpeek().1(x)
By panic you mean if
self.peekedis notNone,self.unpeek(x)will panic, or something else?
Yes, I mean that.
Because I don't see there is any different in capability between
self.unpeek(x)andself.next_unpeek().1(x)
If self.unpeek was to panic, then the difference would be that the former would not advance the iterator and could panic, while the latter would always advance the iterator and would never panic.
If self.unpeek was to silently overwrite the existing peeked value, then the difference would be that the former would sometimes silently advance the iterator before unpeeking, while the latter would always advance the iterator before unpeeking.
We discussed this proposal in today's standard library API team meeting. The signature with (Option<I::Item>, Unpeeker<I>) was not that appealing. We see that the Unpeeker is there to enforce that the peek slot is held empty, in contrast with a signature like fn unpeek(self: &mut Peekable<I>, I::Item) which leaves ambiguity about what happens when the peek slot is already occupied when you unpeek. But as a counterproposal, we would like to suggest the following API that subsumes this use case while also providing for other useful idioms.
impl<I: Iterator> Peekable<I> {
fn peek_slot(&mut self) -> &mut Option<I::Item> {
self.peeked.insert(self.next())
}
}
let peek_slot = peekable.peek_slot();
if let Some(value) = peek_slot.take() {
if bad(&value) {
*peek_slot = Some(value); // unpeek
}
} else {
// end of iterator
}
Yes, that looks good.
Peekable<I> has implemented FusedIterator + ExactSizeIterator + TrustedLen (if I also implements them), would allowing unpeeking (thus changing the result of the following iter.next()) cause these guarantees to be violated?
With the suggested impl:
let mut p = [1, 2].into_iter().peekable();
assert_eq!(p.by_ref().last(), Some(2));
assert_eq!(p.next(), None);
*p.peek_slot() = Some(3);
assert_eq!(p.next(), Some(3)); // so not actually fused?
For FusedIterator it should be fine, as it is only relevant for iter.fused(), which would capture iter so you can't use iter.peek_slot() and the assumption of FusedIterator is still upheld thanks to borrowck. Not sure about ESI and TrustedLen though.
would allowing unpeeking (thus changing the result of the following
iter.next()) cause these guarantees to be violated?
I don't think so, because doing *iter.peek_slot() = Some(...) could be considered the same as *iter = a_whole_new_iter().
I think you're right and it violates ESI, as things that can make an iterator longer almost always do.
If you start with repeat_n(0, usize::MAX) then add something to the peek_slot, now you have a usize::MAX + 1-length iterator which violates ESI.
Basically, it's the same reason that it.chain([1]) isn't ESI even when it is.
EDIT: Oh, I'd missed that the peek_slot calls next -- ~~well, it does it unconditionally which seems wrong, and probably meant .get_or_insert_with(|| self.inner.next())~~ -- so in that case it's fine because it doesn't let you put anything in that slot unless something already got pulled off. (Or there wasn't anything to pull off and you're going to a length of 1 which is fine for ESI.)
well, it does it unconditionally which seems wrong, and probably meant
.get_or_insert_with(|| self.inner.next())
It calls self.next(), not self.inner.next(), so if the slot is already filled, it gets returned by next and re-inserted into itself, so it's already equivalent to .get_or_insert_with(|| self.inner.next()).
Ah, right. Thanks, @T0mstone -- my mistake.
@rust-lang/libs-api chosen to accept the peek_slot proposal from @dtolnay.
Feel free to open a tracking issue and open a PR to rust-lang/rust to add it as an unstable feature. The ACP can be closed once the tracking issue is created.
It just occurred to me during implementation that the proposed API does not actually solve the problem. In particular, if one calls *iter.peek_slot() = None; (or, more realistically, iter.peek_slot().take() for a Some value), then under the current implementation this will cause the subsequent call to .next() to return None (as the peeked value has been replaced with None), which is obviously not what is desired – for this API to be useful, such an operation would have to simply remove the peeked value from the iterator.
On the other hand, if the &mut Option returned by .peek_slot() were to refer to the existence of a peeked value (note that this will require refactors elsewhere in Peekable since the options are now swapped around), then it is unclear what to do if there is no next value. In particular:
- One could return
&mut None. However, since this does not store a peeked value, it means the subsequent call to.next()would be calling.next()on a completed iterator – therefore, this method would likely requireI: FusedIterator. - One could change the signature to return
Option<&mut Option<I::Item>>. However, the guarantee that the innerOptionisSomeis one not afforded by the type system, meaning usages will have to use.take().unwrap(), which is rather inconvenient.
Fundamentally, the issue with the API is that it can’t easily differentiate between the-iterator-is-over None and discard-the-peeked-value None. The original Unpeeker-based API does differentiate (.unpeek(None) versus dropping the Unpeeker), and thus does not have this problem.
Peekable will automatically read from the inner iterator if the peek slot is None: https://doc.rust-lang.org/nightly/src/core/iter/adapters/peekable.rs.html#38. So setting the slot to None effectively just removes the value from the iterator, and the iterator will continue with the next value in the inner iterator.
But not if it is Some(None), which would be the state resulting from having done .peeker_slot().take(), IIUC?
Yes – Some(None) means “a None was peeked”, not “there is no peeked value”.
The alternatives mentioned involve refactoring to make None mean “a None was peeked” and Some(None), what would be result of calling .peek_slot(), mean “there is no peeked value”. Which of course has the drawbacks I mentioned, i.e. if we return &mut None on exhaustion then that effectively throws away any peeked None (which is only valid for FusedIterators), and if we return some other value on exhaustion then we can’t guarantee our &mut Option is actually always &mut Some.
It sounds like we are short by 1 state. Peekable's peeked field is currently Option<Option<I::Item>> with the following meaning:
None—not yet peekedSome(None)—end of inner iterator, not supposed to touch again unless is FusedIteratorSome(Some(value))—peeked value
Would it be prohibitive to change it to the following?
PeekSlot(None)—not yet peeked, or someone called peek_slot and took the valuePeekSlot(Some(value))—someone called peek_slot and then either did not take, or unpeeked, the valueExhausted—end of inner iterator, potentially can specialize away for FusedIteratorNext(value)—peeked value
Correction: in that formulation Next(value) and PeekSlot(Some(value)) are actually equivalent, and instead the missing state is that PeekSlot needs to remember whether the underlying iterator is known to be exhausted.
Next(None)—not yet peeked, or someone called peek_slot and took the valueNext(Some(value))—3 possibilities: ordinary peek, or someone called peek_slot and did not take, or unpeekExhausted(None)—inner iterator is known to be exhaustedExhausted(Some(value))—peek_slot + unpeek on an exhausted iterator
where both of the last two can potentially be specialized away for FusedIterator.
Exhausted(None)—inner iterator is known to be exhausted
Even this is a temporary state, isn't it? e.g. Peekable<mpsc::TryIter> may peek a None and be expected to propagate next() -> None for it, but it can still be called again later to get Some. In other words, Peekable shouldn't add fused behavior.
Hypothetically, we could also allow unpeeking None on non-exhausted iterators, thus making Exhausted(None) serve another purpose. The peek_slot API does not support this, but the underlying state machine does.
It seems that Exhausted(Some(value)) only exists to support unpeek on exhausted iterators. The original problem being solved here does not actually need that functionality – while it might have its uses, is it worth significantly changing the internals of Peekable for?