rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Reserve `gen` keyword in 2024 edition for `Iterator` generators

Open oli-obk opened this issue 2 years ago • 68 comments

Rendered

Tracking issue: https://github.com/rust-lang/rust/issues/117078

Allow implementing Iterators similarly to how we can use async blocks to implement Futures.

// `Iterator` methods
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    values.filter(|value| value.is_odd()).map(|value| value * 2)
}

// `struct` and manual `impl`
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    struct Foo<T>(T);
    impl<T: Iterator<Item = u32>> Iterator<Item = u32> for Foo<T> {
        type Item = u32;
        fn next(&mut self) -> Option<u32> {
            loop {
                let value = self.0.next()?;
                if value.is_odd() {
                    return Some(x * 2)
                }
            }
        }
    }
    Foo(values)
}

// `gen block`
fn odd_dup(values: impl Iterator<Item = u32>) -> impl Iterator<Item = u32> {
    #[rustc_gen] {
        for value in values {
            if value.is_odd() {
                yield value * 2;
            }
        }
    }
}

// `gen fn`
#[rustc_gen]
fn odd_dup(values: impl Iterator<Item = u32>) -> u32 {
    for value in values {
        if value.is_odd() {
            yield value * 2;
        }
    }
}

PR reserving the keyword

I would like to avoid discussing the exact details of the experimental implementation proposal in this RFC and instead discuss that during the implementation. Similarly I would like to avoid any syntax discussion beyond what is needed to determine "which keyword?" or "do we need a keyword at all?".

If this RFC is accepted I expect to be able to work on an initial implementation in January

oli-obk avatar Oct 13 '23 16:10 oli-obk

One thing worth mentioning that I mentioned in the Generator tracking issue: unlike for iterators, generators naturally have a way of indicating that they're infinite by returning !, and I wonder if we could somehow retain this for gen syntax.

I personally think it would be a good idea to not limit this syntax to just iterators, instead allowing arbitrary generators, but special-casing to iterators when the return value is () or !.

clarfonthey avatar Oct 13 '23 16:10 clarfonthey

(NOT A CONTRIBUTION)

I'm very enthusiastic about adding something like this to the language, and there's nothing in this RFC I don't like. I don't have a strong opinion about what keyword is chosen for this feature.

In my own head, I think of the general purpose mechanism (called generators in rustc) as coroutines and use the term "generator" just for functions that return iterators. In other words generator is to Iterator as async is to Future.

I also want to highlight a quote from the original proposal to use external Iterators as the definition of Iterators in Rust, way back in 2013:

In the future, Rust can have generators using a yield statement like C#, compiling down to a fast state machine without requiring context switches, virtual functions or even closures. This would eliminate the difficulty of coding recursive traversals by-hand with external iterators.

(https://web.archive.org/web/20140716172928/https://mail.mozilla.org/pipermail/rust-dev/2013-June/004599.html)

The way it turned out, Rust adopted this sort of thing for futures long before it adopted it for iterators. But an RFC like this was always the plan!

withoutboats avatar Oct 13 '23 21:10 withoutboats

@rustbot labels +I-lang-nominated

Nominating this for T-lang as it's something we need to decide ahead of the 2024 edition.

traviscross avatar Oct 18 '23 06:10 traviscross

We've had a lot of evolutions of the lang-team process. After the long discussion in yesterday's triage meeting, I wanted to leave a comment to clarify how I see it now:

First: Experimental RFCs are dead. Pushing up the daisies. Exterminated. Or however the rest of that Monty Python thing goes. But long live experimental feature gates What is an experimental feature gate? In short, any well known Rust contributor can add a feature-gate and nominate it for the lang-team. As long as somebody on the team is willing to liaison, that's fine, there is no need for a full FCP.

Second: Deferring syntax until later is an anti-pattern. The point of the RFC isn't to work out every detail of the feature, but it is to define the overall user-experience we want, and so syntax can be an important part of that. My goal personally is that, if you've read the RFC and you remember the "general idea", then you should not be surprised once it is stabilized. This doesn't mean there won't have been any changes, but they're hopefully around the edges such that you don't really remember what the RFC said anyway. We've been surprised enough times (cough async-await cough) by deferring an important piece of syntax and then getting stuck with a huge bikeshed at the last minute. Personally I say we pick a keyword -- gen might be it -- and run with it now. We're unlikely to learn anything new, and if we do, we can always submit an RFC and change it. Choice of keyword does feel like something you'd remember from the RFC text.

Put it all together and you get...

  • We can and should be building this on nightly now, if someone from compiler-team is actively driving it (e.g., @oli-obk). No RFC is needed for that, just a responsible owner and a lang-team second.
  • We should land the RFC once we can reasonably describe the user experience. Keyword is definitely part of that.

Less clear to me is the interaction with pinning. That could be a big deal. I personally feel that we want generators to work exactly like async blocks and support self-borrows, which implies they must be pinned. I do not think we want a new trait, rather,I think we want to implement Iterator for Pin<GeneratorType> or some such thing. But I don't remember exactly how this works and don't want to get into the details here -- I think instead we can sidestep the issue by describing "here are the patterns that we know must work" (e.g., for x in generator() { ... }) and here are the patterns that may or may not require pinning (e.g., let i = pin!(generator()); i.next() might be required) and then leave as an unresolved question exactly how we resolve the latter. If it turns out that we have to do something significantly different to support pinning than we thought, and that has a big impact on the user experience, we'll open another RFC.

nikomatsakis avatar Oct 19 '23 14:10 nikomatsakis

Also, to be clear, we should absolutely reserve the keyword in Rust 2024. We know this is an important feature and, while it may not be available by Rust 2024, I very much hope it will be available before Rust 2027.

nikomatsakis avatar Oct 19 '23 14:10 nikomatsakis

I would like to raise a minor concern:

support for full Generators should be discussed and implemented separately, as there are many more open questions around them beyond a simpler way to write Iterators.

This is completely fair and makes sense. However, I'm concerned about accidentally cutting off syntax space for this possible future discussion: if gen already exists and makes generators, should a near-identical keyword coro make the strictly more powerful (for better or worse) coroutines? Would people be willing to accept this quasi-duplication? If not, should the syntax discussion on this RFC take that into account?

T-Dark0 avatar Oct 19 '23 16:10 T-Dark0

I think instead we can sidestep the issue by describing "here are the patterns that we know must work" (e.g., for x in generator() { ... }) and here are the patterns that may or may not require pinning (e.g., let i = pin!(generator()); i.next() might be required) and then leave as an unresolved question exactly how we resolve the latter.

Lets also keep in mind that supporting something like for await x in stream() { ... } should be considered but not explored/developed under this current proposal. I feel like we need a global view of what we want the experience to be, but we shouldn't try to boil the ocean by taking on the whole scope at once. Lets focus on the blockers. Personally I feel like AsyncIterator/Stream is the blocker to determine anything around async iteration, but we know for sure how we want sync iteration to work and feel.

estebank avatar Oct 19 '23 17:10 estebank

@nikomatsakis:

Thanks for writing down these bits of the meeting outcome. After the meeting, and on the basis of it, we made some changes to the RFC as drafted:

We've had a lot of evolutions of the lang-team process. After the long discussion in yesterday's triage meeting, I wanted to leave a comment to clarify how I see it now:

First: Experimental RFCs are dead.

This RFC has now been updated to remove all language about it being experimental. It's now framed as a normal RFC in every respect.

Second: Deferring syntax until later is an anti-pattern.

The change to use a placeholder syntax in this RFC has been reverted. It now commits to a syntax proposal.

traviscross avatar Oct 19 '23 19:10 traviscross

@rustbot labels -I-lang-nominated

We're making some further changes to this RFC. I'm going to unnominate for the moment while we finalize the proposal.

traviscross avatar Oct 19 '23 19:10 traviscross

I'm excited for the RFC process becoming streamlined! This feels like it should make it easier to begin validating promising designs, while maintaining the same high standards we've always had to formally ratify features.

If I'm understanding the new process correctly, it means there are no blockers in order to move forward with what's being proposed by this experimental RFC. Meaning the next steps for this proposal ought to be:

  1. Close this eRFC since experimental RFCs are no longer part of any process (meaning also: this eRFC cannot be merged)
  2. Introduce an experimental feature gate in the compiler + create a tracking issue to begin work on an implementation and reserve the keyword
  3. Once the the design feels like it has settled, open a new (complete) RFC detailing the full design

Did I get that right?

yoshuawuyts avatar Oct 20 '23 00:10 yoshuawuyts

@yoshuawuyts:

Did I get that right?

This is no longer an experimental RFC, as we changed the language after the meeting. Assuming this has a T-lang second (and there are no objections from a T-lang member), landing an implementation could proceed immediately. We still need an RFC with the details nikomatsakis mentioned to reserve the keyword in advance of the 2024 edition.

traviscross avatar Oct 20 '23 02:10 traviscross

(NOT A CONTRIBUTION)

Less clear to me is the interaction with pinning. That could be a big deal. I personally feel that we want generators to work exactly like async blocks and support self-borrows, which implies they must be pinned. I do not think we want a new trait, rather,I think we want to implement Iterator for Pin<GeneratorType> or some such thing. But I don't remember exactly how this works and don't want to get into the details here -- I think instead we can sidestep the issue by describing "here are the patterns that we know must work" (e.g., for x in generator() { ... }) and here are the patterns that may or may not require pinning (e.g., let i = pin!(generator()); i.next() might be required) and then leave as an unresolved question exactly how we resolve the latter. If it turns out that we have to do something significantly different to support pinning than we thought, and that has a big impact on the user experience, we'll open another RFC.

I agree that how to resolve this is the big open question with (non-async) generators.

I've been trying to explore the space of options on my blog. There's no easy answer, though:

  1. Changing for loops to use a pinned interface presents several challenges, requiring a coherence exception and a technically-breaking soundness change to the Pin APIs.
  2. An even more drastic change would be dropping Pin in favor of Move, which would be the biggest change Rust has ever seen.
  3. The most backward compatible but undoubtedly the "worst" solution would be to require you to always pin generators before you can do anything with them. The harder part is what you don't mention: combinators and interfaces that take impl Iterator. Like you could maybe make for loops work and nothing else without a very disruptive change, but if you want generators to implement the iterator trait that's going to be disruptive.

There's been ideation around supporting "moveable" coroutines with some kind of surface syntax. One option would be to ship that syntax, only support moveable generators in the 2024 edition with a plan to ship some disruptive change in 2027.

Another option would be to ship generators in 2024 edition, require them to be moveable, and then make a disruptive change in 2027 without supporting moveable generators anymore. Not supporting moveable generators would mean requiring a heap allocation if you want to return a generator that you've already started iterating.

withoutboats avatar Oct 20 '23 10:10 withoutboats

~~I've been thinking about this. From a procedural point of view, for reserving the gen keyword in RUst 2024, I think we should just open the PR to do that now and FCP the change separately from the RFC. I imagine that'd be quick-and-easy to get lang-team sign-off on.~~

EDIT: Gave this some more thought and I don't like it. I don't think that a major user-visible change like reserving a keyword should be done via FCP.

nikomatsakis avatar Oct 20 '23 11:10 nikomatsakis

@withoutboats I remember you were blogging about that -- let me go read up. I agree it's a hard technical problem, but I'm not ready to give up yet.

One question: what part of this applies to async gen? I guess it depends on the details of how we setup an AsyncIterator trait.

nikomatsakis avatar Oct 20 '23 12:10 nikomatsakis

@yoshuawuyts

If I'm understanding the new process correctly, it means there are no blockers in order to move forward with what's being proposed by this experimental RFC.

Correct.

Once the the design feels like it has settled, open a new (complete) RFC detailing the full design

Yes. To my view, the RFC is the end of the design process. If there's too many open questions for us to feel like we can confidently describe the user experience (and the questions around pin do seem severe), I do think we could just narrow this RFC to say "reserve the gen keyword" and explain why we are going to do this even though we don't have a full design (i.e., we all agree it's high priority and we anticipate a design to be done and stabilized early in the Rust 2024 edition lifecycle).

We can then continue the experimentation -- I think we should create a zulip stream and a repo that holds a draft of the final RFC, and move this conversation to take place there -- and only open the RFC when we're pretty confident.

It seems to me that there are at least two major questions to be settled:

  • interactions with pin -- which merits some detailed design write-ups
  • syntax around gen fn return type

nikomatsakis avatar Oct 20 '23 12:10 nikomatsakis

(NOT A CONTRIBUTION)

@withoutboats I remember you were blogging about that -- let me go read up. I agree it's a hard technical problem, but I'm not ready to give up yet.

I also don't think you should give up!

I think one path that could reduce resistance would be to ship generators without self-references in some way with one or more plans in place for how you could later allow self-references. Then you will find out how much of a problem it really is. If it's not a problem, you don't need to disrupt your users - hooray. In the more likely scenario that users start running into the self-reference limitation, when you disrupt users to fix it you're solving a problem they actually feel, instead of disrupting them to solve a problem you think they would feel in the future. It'll be easier to get buy in that way.

The downside with that approach is that your plan for allowing self-references would have to account that non-self-referencing generators exist, and being compatible with that. But you already have to be compatible with non-self-referencing iterators, so I'm not sure it creates a huge additional burden. Regardless, you would need to have a fully explored set of options before stabilizing.

One question: what part of this applies to async gen? I guess it depends on the details of how we setup an AsyncIterator trait.

If AsyncIterator were stabilized with its current interface, none of this would have to apply to it and async generators could be stabilized with self-references. If instead you have ?async Iterator, without changing Iterator you would be able to allow self-references over await points, but not yield points.

withoutboats avatar Oct 20 '23 13:10 withoutboats

I was thinking about the coherence issue again today, the one presented in boats blog post was that this introduces a diamond incoherence:

impl<T: Iterator> IntoIterator for T { }
impl<T: Iterator> Generator for T {}
impl<T: IntoIterator> IntoGenerator for T {}
impl<T: Generator> IntoGenerator for T {}

But, I don't think the second impl there is needed, impl IntoGenerator for impl IntoIterator can wrap the return type. That reduces the incoherence down to "you're not allowed multiple blanket impls, even when you own every trait and can force downstream to never implement incoherent sets" which I've never understood as a restriction.

(Wrapping the return type of that impl also avoids the Pin issue, the wrapper can just choose not to pin-project to the wrapped iterator so it can safely go Pin<&mut Self> -> &mut T).

Nemo157 avatar Oct 20 '23 20:10 Nemo157

Any chance that the 'Prior Art' section could add an example from a language with algebraic effects such as Koka? I think it's a really important part of the design space and has sufficient semantic differences with most langs (including this proposal) that I think many folks would consider it important. There's an example of generators/yielding on the front page, as it happens.

zesterer avatar Oct 23 '23 21:10 zesterer

IMO this function syntax, while analogous to how async functions work, looks very surprising and non-obvious:

gen fn odd_dup(values: impl Iterator<Item = u32>) -> u32

Without inspecting the opposite end of the line, the function looks like it returns a single u32. I know async fn works the same way, but at least when you see an async fn foo() -> u32 you know it will eventually return a single u32.

Not to mention there's good arguments why async fn might have also been a mistake.

Because the "simple" case of an owned iterator doesn't require complicated type signatures unlike some Futures generated by async fn, wouldn't this be a good time to adopt this function expression syntax suggested many times in the past (including the blog post above)?

fn foo() -> impl Iterator<Item = u32> = gen { ... }

Kray avatar Oct 25 '23 20:10 Kray

This RFC is on hiatus until we have an implementation and will then mirror the implementation. For now out of various concerns, I have no plans on implementing gen fn myself

oli-obk avatar Oct 25 '23 21:10 oli-obk

(NOT A CONTRIBUTION)

I think there are several good arguments for replacing -> with something like yield or yields. I'm pretty much just summarizing here:

  • Educational: it helps users understand that this yields many times, instead of returning once. It ties the declaration syntax to the different return keyword inside the body.
  • Forward compatibility: I personally don't think general purpose coroutines should be declared with the same keyword as generators-as-Iterators, but this does leave space for that possibility in the syntax by allowing the syntax to support a separate -> type declaration.

The only possible difficulty you need to consider is that you want to be forward compatibility with the ability to declare the type yielded by a gen { } block. This may lean toward yields; not sure if gen yield i32 could be supported by the parser or if its ambiguous.

withoutboats avatar Oct 26 '23 08:10 withoutboats

I was thinking potentially fn f() gen i32 for 'generates'

emilyyyylime avatar Oct 26 '23 09:10 emilyyyylime

Regarding the self-referential generator problem, there is a middle ground: implement Unpin and Iterator on generators that don't borrow across yields. The two aren't mutually exclusive.

Then, you could both implement Iterator for Pin<&mut G> for all types and implement Iterator for just the Unpin generators.

Unfortunately, you can't do type guards like where Pin<&mut G>: Iterator, or it'd be a lot easier. You could still save the compiler code gen some work with this blanket impl instead:

impl<I: Iterator + Unpin> Iterator for Pin<&mut I> {
    type Item = I::Item;
    fn next(&mut self) {
        (**self).next()
    }
}

dead-claudia avatar Nov 22 '23 07:11 dead-claudia

(NOT A CONTRIBUTION)

@dead-claudia The problem with that is that it's considered undesirable to have the signature based on the body of the function (other auto traits like Send are already an exception to this, but the language team was historically reticent to add more cases like this). Inferring immovability from the behavior of the generator is usually dismissed for this reason.

withoutboats avatar Nov 22 '23 13:11 withoutboats

Edit: typo Edit 2: misremembered async futures (h/t @Nemo157)

(NOT A CONTRIBUTION)

@dead-claudia The problem with that is that it's considered undesirable to have the signature based on the body of the function (other auto traits like Send are already an exception to this, but the language team was historically reticent to add more cases like this). Inferring immovability from the behavior of the generator is usually dismissed for this reason.

@withoutboats

What about this simplified version: only implementing Unpin for generators that don't borrow across yields? Then, you could just define the two separate Iterator impls, one for the nice case and one for the ugly case:

// And same for `Box<G>`
impl<G: Generator<()>> Iterator for Pin<&mut G> {
    type Item = G::Yielded;
    fn next(&mut self) {
        match self.resume(()) {
            GeneratorState::Yield(v) => Some(v),
            GeneratorState::Return(_) => None,
        }
    }
}

impl<G: Generator<()> + Unpin> Iterator for G {
    type Item = G::Yielded;
    fn next(&mut self) {
        match self.resume(()) {
            GeneratorState::Yield(v) => Some(v),
            GeneratorState::Return(_) => None,
        }
    }
}

dead-claudia avatar Nov 22 '23 19:11 dead-claudia

only implementing Unpin for generators that don't borrow across yields, like what async already does with await

async does not do that, async-fn and async-blocks both create generated future types that are unconditionally !Unpin.

Nemo157 avatar Nov 22 '23 19:11 Nemo157

only implementing Unpin for generators that don't borrow across yields, like what async already does with await

async does not do that, async-fn and async-blocks both create generated future types that are unconditionally !Unpin.

I may be wrong, but I thought simple stuff like async { 1 } were Unpin? Unpin is an auto trait, so async { 1 } and async move { fut.await } (for fut: F and F: Unpin + Future) is IMHO a bit surprising.

Worth mentioning sendability is already something you have to monitor, since Tokio's multi-threaded runtime requires task futures to implement Send.

dead-claudia avatar Nov 22 '23 21:11 dead-claudia

Yes you are wrong, just try it on the playground. :)

RalfJung avatar Nov 22 '23 21:11 RalfJung

I've noticed that the possibility of using a longer keyword like gener is not discussed as an alternative in the RFC. gener in particular is very rare in the wild so would not need to be contextual.

pitaj avatar Dec 01 '23 19:12 pitaj

I’m not up on the details, so I’m probably missing something, but the direction of async fn, generator functions implementing iterators, generator functions implementing Generator, and so on seems a bit strange to me. It seems like there’s really just one fundamental underlying core mechanic: make a function which can return to its caller while remembering where it got to, and can be told to carry on running from that point later. async fn is just that, where the return values are a bunch of Pending followed by a Ready(something). Iterators implemented as functions are just that, where the return values are a bunch of Some(thing) followed by a None. Async iterators implemented as functions are just that, where the return values are a bunch of mixed-up Pending and Ready(Some(thing)), followed by a Ready(None). Fully generalized coroutines are just that, but with a way to send values in as well as out.

So why do we need new language syntax for each of these things? Why can’t we have just one language syntax for “suspendable function”, and then futures, iterators, async iterators, coroutines, and so on are just uses of that one syntax and mechanic to implement different traits, ideally uses that could be wired into place for custom traits by the user as well as stdlib traits?

Hawk777 avatar Dec 06 '23 04:12 Hawk777