rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Arbitrary self types v2

Open adetaylor opened this issue 2 years ago • 54 comments

This PR suggests small changes to the existing unstable "aribtrary self types" feature to make it more flexible. In particular, it suggests driving this feature from a new (ish) Receiver trait instead of from Deref, but to maintain compatibility by having a blanket implementation for all Deref types.

This is a squashed commit of many edits by various folks including @Urhengulas, @Veykril , @madsmtm and myself. Thanks also to @davidhewitt, @Manishearth and many folks over on Zulip for feedback.

Rendered

adetaylor avatar Nov 01 '23 18:11 adetaylor

another good use case for arbitrary self types that may be worth mentioning in the rfc:

#[derive(Default)]
pub struct GCArena {
    objects: Vec<Rc<dyn Any>>,
}

impl GCArena {
    pub fn alloc<T: 'static>(&mut self, value: T) -> Gc<T> {
        let rc: Rc<T> = Rc::new(value);
        let weak = Rc::downgrade(&rc);
        self.objects.push(rc);
        Gc(weak)
    }
}

#[derive(Clone)]
pub struct Gc<T: 'static + ?Sized>(Weak<T>);

impl PartialEq for Gc<T: 'static + ?Sized> {
    fn eq(&self, other: &Gc<T>) -> bool {
        self.0.ptr_eq(&other.0)
    }
}

impl Eq for Gc<T: 'static + ?Sized> {}

impl Hash for Gc<T: 'static + ?Sized> {
    fn hash<H: Hasher>(&self, state: &mut H) {
        (self.0.as_ptr() as *const ()).hash(state);
    }
}

impl<T: 'static + ?Sized> Receiver for Gc<T> {
    type Target = T;
}

impl<T: 'static + ?Sized> Receiver for Gc<T> {
    pub fn get(&self) -> Rc<T> {
        self.0.upgrade().expect("GCArena has been dropped")
    }
}

// demo of using Gc:

pub struct Node<T> {
    pub edges: Vec<Gc<Node<T>>>,
    pub data: T,
}

impl<T> Node<T> {
    pub fn walk(self: &Gc<Self>, seen: &mut HashSet<Gc<Self>>, f: &mut impl FnMut(&Gc<Self>)) {
        if seen.insert(self.clone()) {
            f(self);
            for i in &self.get().edges {
                i.walk(seen, f);
            }
        }
    }
}

programmerjake avatar Nov 02 '23 18:11 programmerjake

@programmerjake I think this is now covered by the paragraph I wrote that is now part of the RFC?

For example, taking &Arc<T> allows me to both clone the smart pointer (noting that the underlying T might not implement Clone) in addition to access the data inside the type, which is useful for some methods. Also, being able to change a method from accepting &self to self: &Arc<Self> can be done in a mostly frictionless way, whereas changing from &self to a static method accepting &Arc<Self> will always require some amount of refactoring.

This feels like the same case you present: you want the ability to both access the inner type and clone the smart pointer, and simply accessing the inner type isn't enough.

clarfonthey avatar Nov 02 '23 19:11 clarfonthey

This feels like the same case you present: you want the ability to both access the inner type and clone the smart pointer, and simply accessing the inner type isn't enough.

well, i had intended to demonstrate a type where you can't just impl Deref, since the inner value may have already been dropped -- I didn't do a very good job of that.

another use case: a type where the inner value doesn't exist locally, e.g. for RPC with promise pipelining:

pub struct Id(u32);
pub struct Promise<T>(Id, PhantomData<T>);

impl<T> Receiver for Promise<T> {
    type Target = T;
}

pub struct MyRemoteData;
pub struct SomeIntermediate;

impl MyRemoteData {
    pub fn get() -> Promise<Self> {
        ...
    }
    pub fn foo(self: Promise<Self>, a: i32) -> Promise<SomeIntermediate> {
        ...
    }
}

impl SomeIntermediate {
    pub fn bar(self: Promise<Self>) -> Promise<()> {
        ...
    }
}

pub async fn demo() {
    MyRemoteData::get().foo(5).bar().await;
}

programmerjake avatar Nov 03 '23 09:11 programmerjake

All, I just wanted to say thanks for all the discussion so far! @Urhengulas and I are working through the various things one at a time - sorry if some of your comment threads are awaiting an RFC update from us, we just haven't got to them all yet. (I'm currently poking at lifetime elision.)

adetaylor avatar Nov 15 '23 21:11 adetaylor

@rustbot labels +I-lang-nominated

I'm nominated to take the temperature of the @rust-lang/lang team on this RFC -- I'd like to see it move forward! I'm going to give it a good read before our meeting

nikomatsakis avatar Nov 16 '23 02:11 nikomatsakis

Gave this a quick read. One thing I would note is that I think that the RFC should highlight more that it also enables *const self methods -- this is a big win for unsafe code, in my opinion, for all the same reasons. Right now code that wants to take a raw pointer and doesn't want to guarantee reference validity is pretty stuck.

nikomatsakis avatar Nov 16 '23 02:11 nikomatsakis

One small point regarding stabilization: the purpose of the current Receiver trait is to tell the compiler which types can be used as a method receiver from code that doesn't have the arbitrary_self_types feature enabled. This way, types like Rc, Arc, and Pin can be defined in std and marked as vetted receiver types that can be used from stable user code, but arbitrary types that implement Deref<Target=Self> cannot be used as method receivers without arbitrary_self_types enabled.

This RFC proposes adding a blanket impl of the new Receiver trait for anything that implements Deref. With that blanket impl, the new Receiver trait will not be useful as a way to mark those vetted receiver types, because every type that implements Deref will implement Receiver. If we want to avoid insta-stabilizing the ability to use arbitrary types that implement Deref as method receivers, we will need some way to tell which ones are the vetted receiver types and which ones aren't, likely with another marker trait that serves the same purpose as the existing Receiver trait. (This marker trait can be removed once arbitrary_self_types is stabilized.)

mikeyhew avatar Nov 16 '23 08:11 mikeyhew

Another note: I'd like to see some more details of how method resolution and deref coercion would work with this change. Currently, it says that the algorithm would remain the same, but instead of using the Deref trait, it would use the new Receiver trait. I'm not sure this is quite right, because some method calls require deref coercions, and this new trait does not imply Deref.

In the following example, the Deref trait would still need to be involved to check if the calls to frobnicate_ref are allowed.

struct Foo;

impl Foo{
    fn frobnicate_ref(&self) {}
    fn frobnicate_cpp_ref(self: CppRef<Self>) {}
}

fn main() {
    let foo_rc: Rc<Foo> = Rc::new(Foo);

    // this compiles fine, and desugars to `Foo::frobnicate_ref(&*foo_rc)`
    foo_rc.frobnicate_ref();

    let foo_cpp_ref: CppRef<Foo> = CppRef::new(Foo);

    // should not compile because it would desugar to `Foo::frobnicate_ref(&*foo_cpp_ref)`
    // and you can't deref a CppRef
    foo_cpp_ref.frobnicate_ref();

    // should compile, because we're not dereffing the CppRef
    // desugars to `Foo::frobnicate_cpp_ref(foo_cpp_ref)`
    foo_cpp_ref.frobnicate_cpp_ref();

}

Aside: thanks @adetaylor and everyone that's helped for taking on the challenge of this RFC. I can tell there's been a lot of effort put in already and a lot of details have been thought out, and it's pretty nice to be able to walk in now, read a well put-together RFC and give a little bit of feedback.

mikeyhew avatar Nov 16 '23 09:11 mikeyhew

With that blanket impl, the new Receiver trait will not be useful as a way to mark those vetted receiver types

Assuming we gate the new behavior behind arbitrary_self_types, we can:

  • Have a blanket impl Receiver for T: Deref gated behind arbitrary_self_types
  • Retain impl Receiver for Box<T> etc. gated behind receiver_trait I'm finding we need to do something like that to maintain compatibility with the bootstrap compiler anyway.

adetaylor avatar Nov 16 '23 12:11 adetaylor

@adetaylor it's been a while since I was involved in compiler development, but I don't remember it being possible to gate an impl behind a feature gate. If it is, then maybe that could work.

mikeyhew avatar Nov 17 '23 04:11 mikeyhew

In the following example, the Deref trait would still need to be involved to check if the calls to frobnicate_ref are allowed.

Using my current playing around branch (which is roughly equivalent to what's proposed in this RFC, though the feature gates are different):

#![feature(receiver_trait)]

use std::rc::Rc;
struct Foo;

struct CppRef<T: ?Sized>(T);

impl<T> CppRef<T> {
    fn new(t: T) -> Self {
        Self(t)
    }
}

impl<T: ?Sized> std::ops::Receiver for CppRef<T> {
    type Target = T;
}
//impl<T: ?Sized> std::ops::Deref for CppRef<T> {
    //type Target = T;
    //fn deref(&self) -> &Self::Target {
        //&self.0
    //}
//}

impl Foo{
    fn frobnicate_ref(&self) {}
    fn frobnicate_cpp_ref(self: CppRef<Self>) {}
}

fn main() {
    let foo_rc: Rc<Foo> = Rc::new(Foo);

    // this compiles fine, and desugars to `Foo::frobnicate_ref(&*foo_rc)`
    foo_rc.frobnicate_ref();

    let foo_cpp_ref: CppRef<Foo> = CppRef::new(Foo);

    // should not compile because it would desugar to `Foo::frobnicate_ref(&*foo_cpp_ref)`
    // and you can't deref a CppRef
    // Indeed does not compile. Currently, it causes a compiler panic, which is non-desirable.
    // This suggests to me a mismatch between the method candidate identification code (which we've updated)
    // and the actual dispatch code.
    // This line DOES compile if CppRef impls Deref above instead of Receiver, as we'd hope.
    // foo_cpp_ref.frobnicate_ref();

    // should compile, because we're not dereffing the CppRef
    // desugars to `Foo::frobnicate_cpp_ref(foo_cpp_ref)`
    // does indeed compile fine
    foo_cpp_ref.frobnicate_cpp_ref();

}

So playing around, I think it's behaving as you propose it should? And yes, this implies that Deref is involved at some level, and you're right, we should try to describe how exactly. But meanwhile if you can confirm that my comments in the snippet above are what you're expecting (apart from the compiler panic!) that would be great.

adetaylor avatar Nov 17 '23 16:11 adetaylor

@adetaylor based on your comments, it sounds like you understand what I was getting at with this example. In the case of foo_cpp_ref.frobnicate_ref(), there should be an error during type checking. If the compiler panic is happening at a later stage, then it should go away once the error is caught earlier during type checking.

mikeyhew avatar Nov 17 '23 22:11 mikeyhew

Another note: I'd like to see some more details of how method resolution and deref coercion would work with this change. Currently, it says that the algorithm would remain the same, but instead of using the Deref trait, it would use the new Receiver trait. I'm not sure this is quite right, because some method calls require deref coercions, and this new trait does not imply Deref.

Yeah, I caught that too.

I think the RFC is mixing two sets of semantics at times. As far as I understand, the reference as it's currently written doesn't explicitly address Box/Arc/etc as receivers, but it does implicitly address them in this paragraph:

Then, for each candidate type T, search for a visible method with a receiver of that type in the following places:

  • T's inherent methods (methods implemented directly on T).
  • Any of the methods provided by a visible trait implemented by T. If T is a type parameter, methods provided by trait bounds on T are looked up first. Then all remaining methods in scope are looked up.

That first bullet point is the part that must be amended to clarify that "T's inherent methods" need to take the Receiver trait into account; the paragraph above about dereferencing chains must be kept untouched.

It's therefore a major compatibility break if existing Deref implementors cease to be usable as self parameters. Just in the standard library, we'd have to add Receiver implementations for Cow, Ref, ManuallyDrop and possibly many other existing implementors of Deref: third party libraries would have to do the same. Without that, method calls on these types would not be possible:

Here too, I think the RFC is confusing "Box<Self> as a receiver" with "Box<Self> as a way to get a &self which is a receiver".

Right now the only custom receivers (outside nightly crates) are the ones defined in the standard library; changing how they're defined isn't a breaking change.

PoignardAzur avatar Nov 21 '23 11:11 PoignardAzur

I'd like to give some context to the "smart pointers in GUI" use-case.

In Masonry, I'm trying to create a way to access a widget hierarchy imperatively, that still automatically updates the render tree whenever something is changed. My ideal API looks something like this:

struct Button {
  label: WidgetPod<Label>,
  size: Size,
  color: Color,
  callback: impl Fn(Stuff),
}

impl Button {
    pub fn new(text: impl Into<String>) -> Button {
        Button::from_label(Label::new(text))
    }

    pub fn from_label(label: Label) -> Button {
        Button {
            label: WidgetPod::new(label),
            // ...
        }
    }

    pub fn set_size(self: &mut WidgetMut<'_, Self>, size: Size) {
        self.widget.size = size;
        self.ctx.request_layout();
    }

    pub fn set_color(self: &mut WidgetMut<'_, Self>, color: Color) {
        self.widget.color = color;
        self.ctx.request_paint();
    }

    pub fn set_callback(self: &mut WidgetMut<'_, Self>, callback: impl Fn(Stuff)) {
        self.widget.callback = callback;
    }
}

That is, every widget would be defined in a way that it's not possible for an external user to directly modify the fields of the widget. Instead users would have to use a WidgetMut<MyWidget> receiver. Widgets might have internal methods taking mutable references, but every public method mutating a widget would take a WidgetMut<MyWidget> receiver; that WidgetMut would include methods to notify the system when repaint or relayout is needed, and people writing new widgets would make sure that setters use those methods.

Some widgets would also have children. These widgets would have method to go from WidgetMut<Parent> to WidgetMut<Child>:

impl Button {
    pub fn label_mut(self: &mut WidgetMut<'_, Self>) -> WidgetMut<'_, Label> {
        self.ctx.get_child_mut(&mut self.widget.label)
    }

    pub fn set_text(self: &mut WidgetMut<'_, Self>, new_text: impl Into<String>) {
        self.label_mut().set_text(new_text.into());
    }
}

So far this RFC looks like it would help me satisfy that use case, so I'm hoping it makes progress! :heart:

EDIT: Note that I've tried to implement an API that looks like the above in Masonry, with workarounds for the lack of arbitrary receivers. Those workarounds are pretty ugly, so I would love for arbitrary receivers to get stabilized, or at least make enough progress that I feel confident going nightly-only just to get them.

PoignardAzur avatar Nov 21 '23 12:11 PoignardAzur

I'd like to give some context to the "smart pointers in GUI" use-case.

@PoignardAzur Thanks for adding the additional context - very happy to accept edits if you think any of this should belong in the Motivation section!

adetaylor avatar Nov 21 '23 17:11 adetaylor

I'm not sure it belongs in the motivation section. ^^

I just wanted to have the full explanation in an easy-to-find place.

PoignardAzur avatar Nov 22 '23 15:11 PoignardAzur

We talked about this at length in today's @rust-lang/lang meeting, and had several thoughts. We were all generally in favor of arbitrary self types, but also had a number of questions and potential improvements. Several of us are going to post our thoughts, in separate comments.

joshtriplett avatar Nov 22 '23 16:11 joshtriplett

In general this is looking great, and I'd definitely like this to become something possible.

I just have one thing about pointers to discuss.


I'm having trouble understanding how both of these make sense at the same time:

[Receiver] is also implemented for &T, &mut T, *const T and *mut T.

Weak<T> and NonNull<T> were not supported by the prior unstable arbitrary self types support, but they share the property that it may be desirable to implement method calls to T using them as self types. Unfortunately they also share the property that these types have many Rust methods using self, &self or &mut self.

We also have, if I counted correctly, 42 methods on *const _ right now. If we go back in time there were far fewer, but today it seems like there's enough that we can't assume it's a full set -- the strict provenance stuff added a whole bunch relatively recently, for example.


More generally, what about all the other possible types in the standard library? I could imagine someone reasonably wanting to have self: MaybeUninit<Self>, for example. But since that doesn't Deref, it also has loads of methods on it, and thus it probably shouldn't be Receiver either? Same for Mutex or Vec or ...

Thus my instinct is that we should require that they make something like the CppRef type you describe in the Motivation, rather than having a pointer be a receiver directly. That doesn't seem that unreasonable, either, since most of the methods on pointers aren't things that one should be doing on a C++ reference anyway.

Then we stick with the "you shouldn't do this if you have methods" rule, which means no impl<T> Receiver for Vec<T>, but it could mean impl<T> Receiver for ManuallyDrop<T>, or similar. (Well, I guess that one has Deref so it'd be caught by the proposed blanket impl.)

Just as a SHOULD, not as a MUST, in the language rule though. If you really want inherent methods -- that always win in resolution -- on something impl Receiver, I think that's fine, just not for standard library things. An external crate could always do a major version bump to add more inherent methods like that if it really wanted to.


Unless there's some strong reason that methods need pointer receivers directly that I'm missing? My instinct is that if you can implement Receiver for your own type, that's fine, and it's not needed on raw pointers directly.

scottmcm avatar Nov 22 '23 18:11 scottmcm

The restriction on NonNull and Weak seems unfortunate, for multiple reasons.

  • These are types that make a lot of sense to use as receivers.
  • It seems like raw pointers also have lots of methods, and we may well add more in the future, so it's not obvious why it's OK to have raw-pointer receivers but not NonNull/Weak receivers. Is there a reason why the same considerations don't apply?

I'd be enthusiastically in favor of this, but I'd like to find a way that we can allow it for NonNull and Weak as well, and in general for receivers that do have methods people might want to call.

When you add a new method to a trait, that's not a breaking change, because inherent methods get priority.

However, with Deref, we already have the rule that methods directly on the type have priority over methods on the Deref::Target. Which makes it a breaking change to add methods directly on the type.

For Receiver, the other way around seems more robust against breakage, since it seems acceptable for a method impl SomeType { fn method(self: FooPtr<Self>); } to shadow FooPtr::method. However, that would be incompatible with existing usage with Deref, and I don't see any way we could get there even if we wanted to.

That said, I think we should consider (in order from most to least preferable):

  • Adding some way to make it not a breaking change to add methods on a type that implements Receiver, or to make it reasonable to put it in the category of "acceptable breakage" like we do inference failure. Then, implement Receiver for NonNull and Weak.
  • Implement Receiver for NonNull and Weak anyway, if we're going to have it for raw pointers. In particular, I don't think it makes sense to have it for raw pointers and not for NonNull.
  • Making raw pointers not a Receiver. Same argument: it doesn't make sense to have it for raw pointer and not NonNull.
  • Add a way for us to transition over an edition to moving the various methods on raw pointers and NonNull to associated functions. That seems like a massive amount of churn, though. I'm only mentioning this possibility for completeness, in case it inspires some better idea.

joshtriplett avatar Nov 22 '23 18:11 joshtriplett

it seems acceptable for a method impl SomeType { fn method(self: FooPtr<Self>); } to shadow FooPtr::method. However, that would be incompatible with existing usage with Deref

Why is it incompatible? From what I understand we could treat Deref and Receiver differently when deciding shadowing precedence. The only types where this could break are types that are already Receiver today, i.e. Box, Rc and friends, but they already don't have inherent methods so that's fine.

Nadrieril avatar Nov 22 '23 20:11 Nadrieril

Implement Receiver for NonNull and Weak

Once we do those, one might ask: why don't we also allow self: Cell<Self>, self: [Self], self: HashSet<Self> and so on?

So yeah, definitely a good point, I think the motivation for allowing raw pointers was also because it lined up nicely with the proposed changes to the method resolution, but we're changing those anyway, so it might not make so much sense any more?

Will have to consider this further!

madsmtm avatar Nov 22 '23 21:11 madsmtm

Thanks for all the discussion around NotNull, Weak etc. I thought I'd take the liberty of summarizing the different perspectives to see if it helps us reach a consensus:

  • The existing unstable arbitrary_self_types feature allows use of raw pointers as self types (but not NotNull, Weak etc. because they don't implement Deref)
  • @scottmcm feels we shouldn't have raw pointer self types, but require people to wrap those pointers in a newtype wrapper (this is my instinct too but just because I'm automatically nervous of larger changes!)
  • @nikomatsakis is of the view that raw pointer self types are important for better quality unsafe code and I've heard related points from @Manishearth who says:

    raw pointer receivers are quite important for the future of safe Rust, because stacked borrows makes it illegal to materialize references in many positions, and there are a lot of operations (like going from a raw pointer to a raw pointer to a field) where you don't need to or want to do that.

  • @joshtriplett says we should implement Receiver for NonNull and Weak if we implement it for raw pointers, but hopes we can find a way to avoid it being a compatibility break

I think: it seems like there's enough desire for raw pointers, NotNull, Weak etc. as self types that we should try pretty hard to find a way to avoid compatibility breaks, even if it means we do something a bit unexpected from the user's perspective. There are some proposals around that in this comment thread about the RFC which I haven't had a chance to properly get my head around, let alone work into RFC revisions! But further comments on that thread are very much appreciated.

(Edit: that linked comment thread now contains an actual proposal and a trial implementation, please take a look)

adetaylor avatar Nov 23 '23 13:11 adetaylor

it seems acceptable for a method impl SomeType { fn method(self: FooPtr<Self>); } to shadow FooPtr::method. However, that would be incompatible with existing usage with Deref

Why is it incompatible? From what I understand we could treat Deref and Receiver differently when deciding shadowing precedence.

I mean that it'd be confusing if some type SmartPtr that implements Deref (and thus gets the blanket impl of Receiver) has methods that take precedence over methods of the type it Derefs to if they take a T, but don't take precedence if they take a SmartPtr<T>.

joshtriplett avatar Nov 23 '23 21:11 joshtriplett

I mean that it'd be confusing if some type SmartPtr that implements Deref (and thus gets the blanket impl of Receiver) has methods that take precedence over methods of the type it Derefs to if they take a T, but don't take precedence if they take a SmartPtr<T>.

I think of it as "most specific impl first" (with caveat and details here). It makes sense in my head, it's kind of like specialization: a foo(Box<Self>) takes priority over a generic foo(self) on Box<T>. I think we can teach this in a way that's not too confusing.

Nadrieril avatar Nov 23 '23 22:11 Nadrieril

Periodic update on what's happening here:

  • @madsmtm is kindly fixing up the current method call documentation
  • I have a PR to try to fix pre-existing bugs in the lifetime elision rules, although current consensus is that we should try to understand and document the current rules before fixing them :) Help and assistance appreciated!
  • I've been trying to implement this proposal for searching impl blocks (via Receiver) differently from actually probing methods (via Deref) - it's proving fiddly I'm in the middle of a very busy two weeks so that's why @madsmtm is doing the only actual effective stuff right now :)

adetaylor avatar Dec 04 '23 14:12 adetaylor

@rustbot labels -I-lang-nominated

This was discussed in the T-lang meeting on 2023-11-27. Some members have left comments above deriving from this.

The general feeling seemed to be that we wanted to find some way to enable this, including for raw pointers, NonNull, etc., but we were feeling unsure about the path to get there.

In light of this, probably further reflection and discussion on this thread or in Zulip is warranted, and a revision to the RFC that addresses this and proposed a plan here would be in order.

Once that is done, please renominate for T-lang so that the team can discuss the changes.

traviscross avatar Dec 13 '23 08:12 traviscross

@traviscross Thank you! Yes.

Another periodic status update:

  • I've figured out (I think) how to solve the problem with non-dereffable types - I intend to write it up. I've even made some (non-fancy) diagrams of how it all works right now and how it will change.
  • I think we can make it work for NonNull and friends using the proposals in this thread... but it's not quite working yet, TBD
  • Together, the two things above will require a significant rewrite of the Reference section of this RFC (maybe even with diagrams, I don't know if that's normal in Rust RFCs)
  • In parallel, the lifetime elision anomalies are being explored here including a crater run

adetaylor avatar Dec 14 '23 16:12 adetaylor

Here's the new problem with NonNull and friends which I allude to in the previous comment.

#![feature(receiver_trait)]

struct MyNonNull<T>(*const T);

#[allow(dead_code)]
impl<T> MyNonNull<T> {
    fn foo<U>(&self) -> *const U {
        self.cast().bar()
    }
    fn cast<U>(&self) -> MyNonNull<U> {
        MyNonNull(self.0 as *const U)
    }
    fn bar(&self) -> *const T {
        self.0
    }
}

This compiles. But if I implement Receiver<Target=T> for MyNonNull<T>, we get:

error[E0282]: type annotations needed
  --> /Users/adetaylor/dev/rust/tests/ui/self/recursive-receiver.rs:13:14
   |
LL |         self.cast().bar()
   |              ^^^^   --- type must be known at this point
   |              |
   |              cannot infer type of the type parameter `U` declared on the method `cast`
   |
help: consider specifying the generic argument
   |
LL |         self.cast::<U>().bar()
   |                  +++++

This is with my working branch here. The only explanation is that the probing must be coing up with more possible method candidates, and this is how the ambiguity is presented. I don't quite understand this yet, but as there's a determination to implement Receiver for NonNull I do think we need to get to the bottom of it before making progress.

adetaylor avatar Dec 15 '23 10:12 adetaylor

Here's how I think method resolution should work after this RFC. This approach allows us to use NonNull method receivers, using something approximating the approach in this thread, and solves the problem highlighted around non-dereffable types here. It also solves the problem in the last comment.

First, this assumes that there is a blanket impl of Receiver for T: Deref. To abuse UML a little bit, things look like this:

classDiagram
  Receiver <|-- Deref
  Receiver <|-- CppRef
  note for CppRef "or other types that\nimpl Receiver but not Deref\ne.g.NonNull"
  Deref <|-- Box
  note for Box "or other types that\nimpl Deref\ne.g.Rc"
  class usize
  note for usize "or other types that\nimpl neither"

So, for any given type, say A<B<C<D<E>>>>, there's a chain of Derefs which can be followed, and a chain of Receivers. The Receiver chain is always just an extension of the Deref chain.

graph LR
  A --Receiver--> B
  A --Deref--> B
  B --Receiver--> C
  B --Deref--> C
  C --Receiver--> D
  D --Receiver-->E

How method probing works now

This is roughly a diagrammatic version of this excellent page though it includes a few different details which will be important for this RFC. This is all, of course, quite simplified.

Everything is driven from autoderef which knows how to iterate through the chain of Deref links shown above. This is used to check valid receiver types (in wfcheck.rs) and twice in method probing, in probe.rs. At present, the same chain of Derefs is used in all three places.

flowchart TB
  subgraph autoderef
     direction TB
     A --Deref--> B
     B --Deref--> C
  end
  subgraph wfcheck[wfcheck.rs]
     wfchecker[Confirm valid self types in each method]
  end
  autoderef-->wfchecker
  subgraph probe[probe.rs]
    method_autoderef_steps
    candidate_steps[(list of candidate steps with\nself_ty\nnumber of derefs)]
    method_autoderef_steps-->candidate_steps
    probe_op[probe_op & various\n'functions calling push_candidate]
    candidate_steps-->probe_op
    candidates[(list of candidate methods with\nassociated item &\nxformed self ty)]
    probe_op-->candidates
    candidates-->pick_all_method
    pick_all_method[pick which candidate\nby iterating types and\nlooking for matching methods.\nAt each step]
    candidate_steps--work out types to which\nreceiver can be converted-->pick_all_method
  end
  autoderef-->method_autoderef_steps
  ambiguity[Report method ambiguity error]
  confirm[confirm.rs]
  pick_all_method--exactly 1 method-->confirm
  pick_all_method--multiple methods-->ambiguity
  e0282[error 0282: type annotation needed]
  method_autoderef_steps--final type was a type variable-->e0282

Note in particular that the candidate_steps are effectively used twice.

How this changes with arbitrary self types

With this RFC, the following changes occur:

  • Sometimes we follow the chain of Receiver instead of Deref. Specifically,
    • We do this in wfcheck.rs when checking for well-formed self types
    • We also do this in probe.rs when assembling the list of candidate functions, but not when we are working out the types to which the actual method call receiver can be converted. For that we continue to use Deref.
  • Sometimes, if there are multiple candidates, we pick one but show a warning about the ambiguity. (This is what allows us to use impl Receiver for NonNull despite the fact that we might want to add methods to NonNullin future).

So the picture changes like this:

flowchart TB
  subgraph autoderef[autoderef using Deref]
     direction TB
     A --Deref--> B
     B --Deref--> C
  end
  subgraph autoderef2[autoderef using Receiver]
     direction TB
     A2[A] --Receiver--> B2[C]
     B2[B] --Receiver--> C2[C]
     C2[C] --Receiver--> D2[D]
     D2[D] --Receiver--> E2[E]
  end
  subgraph wfcheck[wfcheck.rs]
     wfchecker[Confirm valid self types in each method]
  end
  autoderef2-->wfchecker
  subgraph probe[probe.rs]
    method_autoderef_steps[method_autoderef_steps using Deref]
    method_autoderef_steps2[method_autoderef_steps using Receiver]
    candidate_steps[(list of candidate steps with\nself_ty\nnumber of derefs)]
    candidate_steps2[(list of candidate steps with\nself_ty\nnumber of receiver hops)]
    method_autoderef_steps-->candidate_steps
    method_autoderef_steps2-->candidate_steps2
    probe_op[probe_op & various\n'functions calling push_candidate]
    candidate_steps2-->probe_op
    candidates[(list of candidate methods with\nassociated item &\nxformed self ty)]
    probe_op-->candidates
    candidates-->pick_all_method
    pick_all_method[pick which candidate\nby iterating types and\nlooking for matching methods.\nAt each step]
    candidate_steps--work out types to which\nreceiver can be converted-->pick_all_method
  end
  autoderef-->method_autoderef_steps
  autoderef2-->method_autoderef_steps2
  ambiguity[Report method ambiguity error]
  confirm[confirm.rs]
  confirm_warn[confirm but warn of ambiguity]
  pick_all_method--exactly 1 method-->confirm
  pick_all_method--exactly 1 'innermost' method-->confirm_warn
  pick_all_method--multiple methods-->ambiguity
  e0282[error 0282: type annotation needed]
  method_autoderef_steps--final type was a type variable-->e0282

'Innermost' in this context means a viable candidate method on the 'innermost' type, that is, the type we got via the largest number of steps along the Receiver chain.

In practice, the code isn't this complex - we don't have to store two different lists of candidate_steps - but logically this is what we're doing.

More detail on how we support the NotNull<T> case

pick_all_method iterates through the chain of types built by the Deref trait, to look for matching methods.

For example, given a receiver type like this:

graph LR
  A --Receiver--> B
  A --Deref--> B
  B --Receiver--> C
  B --Deref--> C
  C --Receiver--> D
  D --Receiver-->E

it will iterate first through A<B<C<D<E>>>> and then B<C<D<E>>> and then C<D<E>>. At each of those steps, it will look for candidate functions found during the probe_op stage.

Within a given step of iteration, for instance, just considering a receiver type of A<B<C<D<E>>>> with no derefs applied, we might find candidates functions within the impl blocks for A<B<C<D<E>>>>, B<C<D<E>>>, C<D<E>>, D<E> or E. Any of those might have fn something(self: A<B<C<D<E>>>>>).

If we do find multiple candidates in each level of iteration, currently, we show an error about the ambiguity.

The proposed change is:

  • switch the error to a warning
  • only if this ambiguity results from differently-nested candidates
  • pick the most-nested candidate

This all seems counterintuitive, since the overall iteration iterates from the "outer" type via the deref chain. So we always iterate from the outside inwards in terms of dereferencing the receiver. But within each individual iteration step, if there are candidates found in impl blocks of differing types, we'll now pick the innermost.

Although slightly counterintuitive, the warning should cause everyone to resolve the ambiguity by being explicit, so this should not result in an ongoing cognitive load for Rustaceans. This is a compatible change because previously, errors would always have been shown in this circumstance.

Code example: (pseudo-code)

// Assume NonNull<T> implements Receiver

struct Dog;

impl Dog {
  fn bark(self: NonNull<Self>) { }
}

fn main() {
   let d: NonNull<Dog> = visit_kennels();
   d.bark(); // obviously calls Dog::bark
}

// Now imagine NonNull adds a bark(self) method in the standard library
// Proposal:
// Compiler CONTINUES to choose Dog::bark but shows a warning along the lines of:
//   Call to Dog::bark shadows NonNull::bark. Be explicit about which method you're aiming to call,
//   e.g. Dog::bark(dog)

Overall it appears to make it possible for us to support using NonNull<T> and friends as receivers. There are some details to figure out:

  • Currently, all by-value methods are searched before autoreffed methods. So, if we had Dog::bark(self: &NonNull<Self>) but then added NonNull::bark(self) then this would, I think, still cause undesirable shadowing. We may need to rearrange the way self_pick_all_method works so it can spot this scenario.
  • There are bits of the algorithm around collapsing trait picks, and unstable candidates, which I haven't figured out and we need to see if it fits this proposal.

Takeaways

The main bits of cunning here are:

  • We follow the chain of Receiver while identifying candidate methods, but we follow the shorter chain of Deref when working out the types to which the actual method receiver can be converted.
  • In pick_all_method, when we're considering a particular self type A<B>, if there's ambiguity about whether to choose a method on A or B in A<B>, we'll choose the method on B but show a warning. This allows us to add new methods on types like NonNull<T>.
  • The error 0282 discussed in the last comment is solved by considering bad types at the end of the Deref chain not at the end of the Receiver chain.

Thoughts? Does this all make any sense? Can you see any pitfalls?

This is all implemented hackily here.

(@madsmtm @Urhengulas!)

adetaylor avatar Dec 15 '23 17:12 adetaylor