libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

Add `iter::Peekable::next_unpeek`

Open SabrinaJewson opened this issue 9 months ago • 3 comments

Proposal

Problem statement

Working with iter::Peekable is often annoying because of two reasons:

  1. The peeked value is only a reference, which means one can’t make use of ownership, and sometimes ends up with a double reference.
  2. If the peeked value is considered good, one has to call .next() and discard the value, which looks a little strange – worse, one will need to match on it to take ownership of the contents within, with the other branches being forced to panic.

Some of these problems are alleviated by the addition of .next_if – however, this isn’t applicable in all situations.

Motivating examples or use cases

One example is the following code, which consumes all Text elements from a pulldown_cmark::Parser:

loop {
    match parser.peek() {
        Some(pulldown_cmark::Event::Text(_)) => {
            let pulldown_cmark::Event::Text(s) = parser.next().unwrap() else { unreachable!() };
            do_something(s);
        }
        _ => break,
    }
}

Solution sketch

impl<I: Iterator> Peekable<I> {
    pub fn next_unpeek(&mut self) -> (Option<I::Item>, Unpeeker<'a, I>) {
        (self.next(), Unpeeker(&mut self.peeked))
    }
}

pub struct Unpeeker<'a, I: Iterator>(&'a mut Option<Option<I::Item>>);

impl<'a, I: Iterator> Unpeeker<'a, I> {
    pub fn unpeek(self, item: Option<I::Item>) {
        *self.0 = Some(item);
    }
}

With this solution, the above code is made simpler:

loop {
    match parser.next_unpeek() {
        (Some(pulldown_cmark::Event::Text(s)), _) => do_something(s),
        (other, unpeeker) => {
            unpeeker.unpeek(other);
            break;
        }
    }
}

Alternatives

  • Instead of using a dedicated type Unpeeker<'a, I>, use an impl FnOnce. The disadvantage of impl FnOnce is that borrowck can’t see that it has no Drop implementation, meaning that less code compiles (see this comment).
  • Provide direct access to the underlying peeked field, or fn unpeek(&mut self, peeked: Option<I::Item>). The disadvantage of this is that one is able to replace an existing peeked value that is not None, which is a footgun. The unpeek method could panic in this case, but that’s also not great.

Links and related work

None.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

SabrinaJewson avatar Mar 08 '25 14:03 SabrinaJewson

I have discovered that the Unpeeker API is probably better, because it plays better with borrowck. In particular, see this example (playground):

fn main() {
    let mut iter = Peekable::new([1, 2].into_iter());
    // Fails!
    // while let (Some(_), _) = iter.next_unpeek_v1() {
    //     iter.next_unpeek_v1();
    // }
    // Compiles:
    while let (Some(_), _) = iter.next_unpeek_v2() {
        iter.next_unpeek_v2();
    }
}

struct Peekable<I: Iterator> {
    peeked: Option<Option<I::Item>>,
    iter: I,
}

impl<I: Iterator> Peekable<I> {
    fn new(iter: I) -> Self {
        Self { peeked: None, iter }
    }
    fn next_unpeek_v1(&mut self) -> (Option<I::Item>, impl FnOnce(Option<I::Item>) + use<'_, I>) {
        let item = self.peeked.take().unwrap_or_else(|| self.iter.next());
        (item, |item| {
            self.peeked = Some(item);
        })
    }
    fn next_unpeek_v2(&mut self) -> (Option<I::Item>, Unpeeker<'_, I>) {
        let item = self.peeked.take().unwrap_or_else(|| self.iter.next());
        (item, Unpeeker(&mut self.peeked))
    }
}

struct Unpeeker<'a, I: Iterator>(&'a mut Option<Option<I::Item>>);

impl<'a, I: Iterator> Unpeeker<'a, I> {
    fn unpeek(self, item: Option<I::Item>) {
        *self.0 = Some(item);
    }
}

It seems that borrowck gets very conservative around RPIT types, since it doesn’t know whether its Drop implementation is significant. Whereas by using an Unpeeker type, we can inform borrowck that the Drop is insignificant, allowing more code to compile.

SabrinaJewson avatar Mar 08 '25 15:03 SabrinaJewson

The disadvantage of this is that one is able to replace an existing peeked value that is not None, which is a footgun. The unpeek method could panic in this case, but that’s also not great.

By panic you mean if self.peeked is not None, self.unpeek(x) will panic, or something else?

Because I don't see there is any different in capability between self.unpeek(x) and self.next_unpeek().1(x)

kennytm avatar Mar 08 '25 15:03 kennytm

By panic you mean if self.peeked is not None, self.unpeek(x) will panic, or something else?

Yes, I mean that.

Because I don't see there is any different in capability between self.unpeek(x) and self.next_unpeek().1(x)

If self.unpeek was to panic, then the difference would be that the former would not advance the iterator and could panic, while the latter would always advance the iterator and would never panic.

If self.unpeek was to silently overwrite the existing peeked value, then the difference would be that the former would sometimes silently advance the iterator before unpeeking, while the latter would always advance the iterator before unpeeking.

SabrinaJewson avatar Mar 08 '25 16:03 SabrinaJewson

We discussed this proposal in today's standard library API team meeting. The signature with (Option<I::Item>, Unpeeker<I>) was not that appealing. We see that the Unpeeker is there to enforce that the peek slot is held empty, in contrast with a signature like fn unpeek(self: &mut Peekable<I>, I::Item) which leaves ambiguity about what happens when the peek slot is already occupied when you unpeek. But as a counterproposal, we would like to suggest the following API that subsumes this use case while also providing for other useful idioms.

impl<I: Iterator> Peekable<I> {
    fn peek_slot(&mut self) -> &mut Option<I::Item> {
        self.peeked.insert(self.next())
    }
}

let peek_slot = peekable.peek_slot();
if let Some(value) = peek_slot.take() {
    if bad(&value) {
        *peek_slot = Some(value); // unpeek
    }
} else {
    // end of iterator
}

dtolnay avatar Apr 08 '25 17:04 dtolnay

Yes, that looks good.

SabrinaJewson avatar Apr 08 '25 20:04 SabrinaJewson

Peekable<I> has implemented FusedIterator + ExactSizeIterator + TrustedLen (if I also implements them), would allowing unpeeking (thus changing the result of the following iter.next()) cause these guarantees to be violated?

With the suggested impl:

let mut p = [1, 2].into_iter().peekable();
assert_eq!(p.by_ref().last(), Some(2));
assert_eq!(p.next(), None);

*p.peek_slot() = Some(3);
assert_eq!(p.next(), Some(3)); // so not actually fused?

For FusedIterator it should be fine, as it is only relevant for iter.fused(), which would capture iter so you can't use iter.peek_slot() and the assumption of FusedIterator is still upheld thanks to borrowck. Not sure about ESI and TrustedLen though.

kennytm avatar Apr 09 '25 06:04 kennytm

would allowing unpeeking (thus changing the result of the following iter.next()) cause these guarantees to be violated?

I don't think so, because doing *iter.peek_slot() = Some(...) could be considered the same as *iter = a_whole_new_iter().

programmerjake avatar Apr 09 '25 07:04 programmerjake

I think you're right and it violates ESI, as things that can make an iterator longer almost always do.

If you start with repeat_n(0, usize::MAX) then add something to the peek_slot, now you have a usize::MAX + 1-length iterator which violates ESI.

Basically, it's the same reason that it.chain([1]) isn't ESI even when it is.

EDIT: Oh, I'd missed that the peek_slot calls next -- ~~well, it does it unconditionally which seems wrong, and probably meant .get_or_insert_with(|| self.inner.next())~~ -- so in that case it's fine because it doesn't let you put anything in that slot unless something already got pulled off. (Or there wasn't anything to pull off and you're going to a length of 1 which is fine for ESI.)

scottmcm avatar Apr 09 '25 07:04 scottmcm

well, it does it unconditionally which seems wrong, and probably meant .get_or_insert_with(|| self.inner.next())

It calls self.next(), not self.inner.next(), so if the slot is already filled, it gets returned by next and re-inserted into itself, so it's already equivalent to .get_or_insert_with(|| self.inner.next()).

T0mstone avatar Apr 09 '25 13:04 T0mstone

Ah, right. Thanks, @T0mstone -- my mistake.

scottmcm avatar Apr 09 '25 20:04 scottmcm

@rust-lang/libs-api chosen to accept the peek_slot proposal from @dtolnay.

Feel free to open a tracking issue and open a PR to rust-lang/rust to add it as an unstable feature. The ACP can be closed once the tracking issue is created.

Amanieu avatar Apr 15 '25 17:04 Amanieu

It just occurred to me during implementation that the proposed API does not actually solve the problem. In particular, if one calls *iter.peek_slot() = None; (or, more realistically, iter.peek_slot().take() for a Some value), then under the current implementation this will cause the subsequent call to .next() to return None (as the peeked value has been replaced with None), which is obviously not what is desired – for this API to be useful, such an operation would have to simply remove the peeked value from the iterator.

On the other hand, if the &mut Option returned by .peek_slot() were to refer to the existence of a peeked value (note that this will require refactors elsewhere in Peekable since the options are now swapped around), then it is unclear what to do if there is no next value. In particular:

  • One could return &mut None. However, since this does not store a peeked value, it means the subsequent call to .next() would be calling .next() on a completed iterator – therefore, this method would likely require I: FusedIterator.
  • One could change the signature to return Option<&mut Option<I::Item>>. However, the guarantee that the inner Option is Some is one not afforded by the type system, meaning usages will have to use .take().unwrap(), which is rather inconvenient.

Fundamentally, the issue with the API is that it can’t easily differentiate between the-iterator-is-over None and discard-the-peeked-value None. The original Unpeeker-based API does differentiate (.unpeek(None) versus dropping the Unpeeker), and thus does not have this problem.

SabrinaJewson avatar Apr 15 '25 20:04 SabrinaJewson

Peekable will automatically read from the inner iterator if the peek slot is None: https://doc.rust-lang.org/nightly/src/core/iter/adapters/peekable.rs.html#38. So setting the slot to None effectively just removes the value from the iterator, and the iterator will continue with the next value in the inner iterator.

Amanieu avatar Apr 15 '25 23:04 Amanieu

But not if it is Some(None), which would be the state resulting from having done .peeker_slot().take(), IIUC?

danielhenrymantilla avatar Apr 16 '25 01:04 danielhenrymantilla

Yes – Some(None) means “a None was peeked”, not “there is no peeked value”.

The alternatives mentioned involve refactoring to make None mean “a None was peeked” and Some(None), what would be result of calling .peek_slot(), mean “there is no peeked value”. Which of course has the drawbacks I mentioned, i.e. if we return &mut None on exhaustion then that effectively throws away any peeked None (which is only valid for FusedIterators), and if we return some other value on exhaustion then we can’t guarantee our &mut Option is actually always &mut Some.

SabrinaJewson avatar Apr 16 '25 12:04 SabrinaJewson

It sounds like we are short by 1 state. Peekable's peeked field is currently Option<Option<I::Item>> with the following meaning:

  • None—not yet peeked
  • Some(None)—end of inner iterator, not supposed to touch again unless is FusedIterator
  • Some(Some(value))—peeked value

Would it be prohibitive to change it to the following?

  • PeekSlot(None)—not yet peeked, or someone called peek_slot and took the value
  • PeekSlot(Some(value))—someone called peek_slot and then either did not take, or unpeeked, the value
  • Exhausted—end of inner iterator, potentially can specialize away for FusedIterator
  • Next(value)—peeked value

dtolnay avatar Apr 16 '25 16:04 dtolnay

Correction: in that formulation Next(value) and PeekSlot(Some(value)) are actually equivalent, and instead the missing state is that PeekSlot needs to remember whether the underlying iterator is known to be exhausted.

  • Next(None)—not yet peeked, or someone called peek_slot and took the value
  • Next(Some(value))—3 possibilities: ordinary peek, or someone called peek_slot and did not take, or unpeek
  • Exhausted(None)—inner iterator is known to be exhausted
  • Exhausted(Some(value))—peek_slot + unpeek on an exhausted iterator

where both of the last two can potentially be specialized away for FusedIterator.

dtolnay avatar Apr 16 '25 16:04 dtolnay

  • Exhausted(None)—inner iterator is known to be exhausted

Even this is a temporary state, isn't it? e.g. Peekable<mpsc::TryIter> may peek a None and be expected to propagate next() -> None for it, but it can still be called again later to get Some. In other words, Peekable shouldn't add fused behavior.

cuviper avatar Apr 16 '25 17:04 cuviper

Hypothetically, we could also allow unpeeking None on non-exhausted iterators, thus making Exhausted(None) serve another purpose. The peek_slot API does not support this, but the underlying state machine does.

It seems that Exhausted(Some(value)) only exists to support unpeek on exhausted iterators. The original problem being solved here does not actually need that functionality – while it might have its uses, is it worth significantly changing the internals of Peekable for?

SabrinaJewson avatar Apr 16 '25 21:04 SabrinaJewson