carbon-lang icon indicating copy to clipboard operation
carbon-lang copied to clipboard

how to make a non-virtual call to a virtual function?

Open zygoloid opened this issue 1 year ago • 14 comments

Summary of issue:

In C++, there is often a desire to make a non-virtual call to a virtual function -- for example when implementing a function in a derived class, it's sometimes desirable to call the base class implementation of the function.

class Base {
public:
  virtual void f() { ... }
};
class Derived : public Base {
public:
  virtual void f() override {
    // ... Derived things
    Base::f();
    // ... Derived things
  }
};

How do we express this kind of pattern in Carbon?

Details:

In Carbon, given

base class Base {
  virtual fn F[self: Self]() { ... }
}

... and p: Base*, we know:

  • p->F() performs virtual dispatch
  • p->F() means the same thing as p->(Base.F)() Therefore in general, qualified and unqualified calls to virtual functions always perform virtual dispatch -- they're calls to the same callable object Base.F, and its implementation of BindToRef is (presumably) where virtual dispatch happens.

So it seems that it is not straightforward for us to follow C++ and say that qualified calls don't do virtual dispatch, unless we make some kind of hole in our model for member binding operators or qualified name lookup that allows us to distinguish these cases. And it's not even clear that we'd want to: allowing virtual function calls to be forwarded and invoked indirectly is necessary for us to have a story compatible with C++'s story for pointers to members.

Some options:

  • Do nothing. Provide no mechanism to make a non-virtual call to a virtual function. This means that a virtual function can never be called on the "wrong" dynamic type, and base classes that wish to expose an implementation for a derived class to reuse would need to give that implementation a different name.
    • This is probably the cleanest and most principled design, and provides the strongest guarantees to class authors.
    • Major downsides: interoperability with C++ classes may require non-virtual calls to C++ virtual functions, Carbon virtual functions would still be callable with the "wrong" dynamic type from C++, and migration of such calls is more challenging due to requiring a larger-scale rewrite.
  • Add some mechanism to form a method object that performs non-virtual dispatch. Presumably this would be an operation that takes Base.F and a class derived from Base as input, and returns a callable. There are many options here; we could do this by overloading BindTo* (for example p->(Base.F.(Base))()) or adding a member function to the type of virtual functions (p->(Base.F.NonVirtual(Base))()) or adding a member function to the type of bound virtual functions (p->F.NonVirtual(Base)() -- though this would presumably first perform a vtable lookup then throw away the result, which seems suboptimal) or adding a free function (p->(NonVirtual(Base.F, Base))()) or ...
    • This option seems like it might be a reasonable approach, with the right syntax.
  • Use different (non-method) call notation for a non-virtual call. For example, we could permit Base.F(p) as a direct non-virtual call, and p->(Base.F)() as a virtual call.
  • Distinguish qualified and unqualified call syntax, perhaps with additional interfaces around BindToRef that distinguish direct and indirect binding in some way. Eg, p->F could mean something slightly different from p->(Base.F) or p->(f), with the former performing virtual dispatch and the latter not doing so.
    • This would probably be the most similar to C++, but harms our story for providing a feature similar to pointers-to-members.
  • Distinguish specifically the case of p->(Class.F)(), and say that performs a non-virtual call, whereas let f: auto = Class.F and p->(f)() would perform a virtual call.
    • This would be a non-uniformity in the language behavior.
    • Does not provide a direct way to form a non-virtual function value that can be passed to another function. This can perhaps be emulated with a lambda. Note that C++ pointers-to-members don't support this either.

A related question is whether Derived.F is a different value from Base.F, or just finds F in the extended base class. The impl fn F() doesn't need to introduce a new, shadowing F, just to implement the existing F, if there is no situation where Derived.F behaves differently from Base.F.

Any other information that you want to share?

A related concern is the behavior of derived_p->base.F(). Intuitively it seems like this might invoke the base class's version of F, but presumably won't: derived_p->base is a reference expression naming a Base object, so derived_p->base should behave like *base_p, and (*base_p).F() should perform virtual dispatch, and so derived_p->base.F() seems like it must also perform virtual dispatch, unless we add some kind of special case.

zygoloid avatar Jul 03 '24 19:07 zygoloid

The repetitiveness of p->(Base.F.NonVirtual(Base))() is a bit of a concern for me. There are two different Bases here with two different meanings -- one is where we look up F and the other is the derived type whose implementation we're using (and there's the secret Base we get from the type of p). Reducing this a bit would be nice if we take this route -- perhaps something like p->(NonVirtual(Base).F)() could work?

zygoloid avatar Jul 03 '24 19:07 zygoloid

A related question is whether Derived.F is a different value from Base.F, or just finds F in the extended base class. The impl fn F() doesn't need to introduce a new, shadowing F, just to implement the existing F, if there is no situation where Derived.F behaves differently from Base.F.

Note that Derived.F needs to exist at least for redeclaration lookup. Given:

base class Base {
  virtual fn F[self: Self]();
}
class Derived {
  extend base: Base;
  impl fn F[self: Self]();
}

we want

fn Derived.F[self: Self]() {}

to define Derived.F, not Base.F! This also means that the name F ought to be reserved in Derived for that function, and it shouldn't be possible to declare an unrelated name F in Derived, because that name would collide with a redeclaration of the impl fn F.

The above means that Derived.F, in at least the declaration name context, refers to the F that is specifically declared in class Derived, and not some overrider nor the virtual function it overrides. So perhaps treating d.(Derived.F)() as calling that exact function is reasonable.

Another connected issue is the behavior of a virtual call to a covariant overrider:

base class Base {
  virtual fn F[self: Self](param: A);
}
base class Intermed {
  extend base: Base;
  impl fn F[self: Self](param: B);
}
class Derived {
  extend base: Intermed;
  impl fn F[self: Self](param: C);
}

Here we require an implicit conversion from A to B and an implicit conversion from A to C to exist when building the two thunks. But we don't require a conversion from B to C to exist, nor a conversion from B to A to exist. So given an i: Intermed that actually refers to a Derived, and b: B, what happens when we call i.F(b)?

  • We can't convert b to A then call the virtual function.
  • We can't convert b to C then call the final overrider: there's no such conversion and in any case we don't have a suitable thunk / vtable entry. So it seems like the only option is that we must reject the call. In effect, a call to F on Intermed must be type-checked as if it called Base.F directly.

Proposed rules

A virtual function is declared with at least one of up to four modifiers:

[[abstract] [virtual] | final] [impl] fn F();
  • abstract means "derived class must override this virtual function"
  • final means "derived class must not override this virtual function"
  • virtual means "introduce a new vtable slot with this signature"
  • impl means "this class overrides this virtual function"

Combining abstract or virtual with final is disallowed. If abstract is not explicitly specified, impl is implied. If abstract is explicitly specified, virtual is implied if there is no matching virtual function in the base class.

When a virtual function is declared, the overridden vtable slots with the same name from the transitive base class(es) are discovered. Unless the virtual modifier is present, there must be at least one such overridden function. Thunks are created from each of the overridden functions to the new function.

impl means that this class provides a definition of the function with the provided signature. In the absence of impl, this class does not have its own definition of the function.

When a function is named by Class.Function, that names the exact function, and can be used as normal. Likewise when the function name is found by unqualified lookup within the class scope: self.(Function)() does the same thing as self.(Self.Function)().

When a function is named directly in simple member access on an object of the class type, such as object.Function():

  • If the named function is not virtual or either the function or the class is final, the exact function is called. It is an error if that function is not implicitly or explicitly impl.
  • Otherwise, the derived-most function introduced with virtual is used for type-checking the call and the thunk from the vtable is called.

Some examples:

abstract class A {
  // New vtable slot, no definition in A, must be overridden.
  abstract virtual fn F1();
  // Shorthand for previous.
  abstract fn F2();
  // New vtable slot, A has a definition that can be called by qualified name.
  // Must nonetheless be overridden.
  abstract virtual impl fn F3();
  // Shorthand for previous.
  abstract impl fn F4();
  // New vtable slot, definition in A, need not be overridden.
  virtual impl fn F5();
  // Shorthand for previous.
  virtual fn F6();
}

abstract class B {
  extend base: A;

  // No new vtable slot (overrides A.F1).
  impl fn F1();
  // New vtable slot. Also overrides A.F2.
  virtual impl fn F2();
  // Shorthand for previous.
  virtual fn F3();
  // No-op, function F4 remains abstract.
  abstract fn F4();
  // No new vtable slot, no definition in B, must be overridden.
  abstract fn F5();
  // No new vtable slot, definition in B, must be overridden.
  abstract impl fn F6();
}

base class C {
  extend base: B;

  // Overrides A.F4.
  impl fn F4();
  // No new vtable slot, defined in C. Can't be overridden.
  final impl fn F5();
  // Shorthand for previous.
  final fn F6();

  fn G[self: Self]() {
    // Virtual call to A.F1
    self.F1();
    // Direct call to B.F1
    self.(C.F1)();
    // Direct call to B.F1
    self.(Self.F1)();
    // Direct call to B.F1
    self.(F1)();
  }
}

class D {
  impl fn F4();
}

fn F(d: D) {
  // Virtual call type-checked against A.F1
  d.F1();
  // Virtual call type-checked against B.F2
  d.F2();
  // Virtual call type-checked against B.F2
  d.F3();
  // Direct call to D.F4, class is final.
  d.F4();
  // Direct call to final function D.F5.
  d.F5();
  // Direct call to final function D.F6.
  d.F6();

  // Error, A does not implement F2.
  d.(A.F2)();
  // Direct call to A.F4.
  d.(A.F4)();
  // Direct call to A.F6.
  d.(A.F6)();

  // Error, B does not implement F5.
  d.(B.F5)();
  // Direct call to B.F6.
  d.(B.F6)();

  // Direct call to B.F1, not overridden in C.
  d.(C.F1)();
}

zygoloid avatar Apr 04 '25 01:04 zygoloid

But we don't require a conversion from B to C to exist

It seems like we absolutely should require a conversion from B to C to exist, if for no other reason by the substitutability principle that you can use Derived in place of Intermed. And we could very well have an additional thunk in the vtable for calls to Intermed if we decided to -- or is there a problem with C++ compatibility?

(Edit: I have thought more and I agree with you and don't think we should do this, see my next message.)

josh11b avatar Apr 06 '25 02:04 josh11b

While we could have an additional thunk in the vtable introduced in Intermed, upon reflection I realize this is a cost that should not be introduced silently or by default. In some sense Intermed has two functions: a non-virtual function taking a parameter of type B, and a virtual function taking a parameter of type A that performs an implicit conversion and then calls the non-virtual function. You then provide reasonable rules for deciding which function to use in various circumstances, but we should also consider using optional names that were included in the proposed overloading syntax, see the discussion on 2025-03-28.

josh11b avatar Apr 06 '25 17:04 josh11b

One regression from this would be breaking the currently very simple relationship between abstract on a class and abstract on a method. Right now:

Only abstract classes may have abstract methods.

The proposed change would make this:

All virtual methods must be impl unless the class is abstract (abstract methods are about something else).

josh11b avatar Apr 07 '25 15:04 josh11b

One regression from this would be breaking the currently very simple relationship between abstract on a class and abstract on a method. Right now:

Only abstract classes may have abstract methods.

The proposed change would make this:

All virtual methods must be impl unless the class is abstract (abstract methods are about something else).

That's not part of the proposed change. That rule would remain unchanged.

zygoloid avatar Apr 07 '25 16:04 zygoloid

So, abstract impl fn seems proposed for consistency with C++.

That said, it seems there are two use-cases in C++:

  • Forcing a class to be abstract through an abstract destructor, without any other abstract functions.
    • This is the main use-case.
  • For arbitrary abstract functions, giving a base implementation that can be called, similar to any virtual function.
    • But it's still abstract and must be overridden, so the entry in the virtual table isn't ever "used".

For example with arbitrary functions:

class Base {
 public:
  virtual auto F() -> void = 0;
}

auto Base::F() -> void {
  // Implementation here.
}

class Child : public Base {
 public:
  auto F() -> void override {
    // Can reuse parent implementation.
    Base::F();
    // Maybe more implementation.
  }
}

In Carbon, given abstract impl fn as suggested above, it would become:

abstract class Base {
  abstract impl fn F[self: Self]() -> void;
}

abstract impl fn Base.F[self: Self]() -> void {
  // Implementation here.
}

class Child : public Base {
  impl F[self: Self]() -> void {
    // Can reuse parent implementation.
    self.(Base.F)();
    // Maybe more implementation.
  }
} 

Per discussion, my impression here is rationale for this would look a bit like:

  • Consistency with C++: Offers a direct migration of a feature.
  • Consistency with reusing a parent's virtual function: if F in the above were virtual instead of abstract impl, the same call structure would work.
    • This may tail into code maintenance/evolution benefits.

But, does Carbon need this feature?

The main use-case (forcing a class to be abstract) is already addressed in Carbon by abstract class. For the arbitrary function use-case, there are a couple alternatives available: either making F a virtual fn instead of abstract fn, or shifting the implementation to a different function name; for example, with FInBase:

abstract class Base {
  abstract fn F[self: Self]() -> void;
  protected fn FInBase[self: Self]() -> void;
}

impl fn Base.F[self: Self]() -> void {
  // Implementation here.
}

class Child : public Base {
  impl F[self: Self]() -> void {
    // Can reuse parent implementation.
    self.FInBase();
    // Maybe more implementation.
  }
} 

Note, I'm not sure how to check how common a pattern this is: typical code search mechanisms don't seem like they'd work well to find the C++ pattern. @zygoloid had suggested building a quick compiler diagnostic to find uses as the most feasible approach (if someone wants to try).

Anyways, abstract impl fn has a trade-off of consistency versus language complexity... Is the use-case common enough that it should be supported? Could a addition of abstract impl be delayed until there's evidence of friction from alternatives?

jonmeow avatar Apr 07 '25 18:04 jonmeow

The C++ analogue of abstract impl fn (excluding the abstract destructor case, for which in Carbon we'd just declare the class itself as abstract) is rare. Perhaps we can avoid supporting it.

zygoloid avatar Apr 07 '25 19:04 zygoloid

  // New vtable slot. Also overrides A.F2.
  virtual impl fn F2();
  // Shorthand for previous.
  virtual fn F3();

FWIW, I find this a bit undesirable - that the overriding/implementing behavior of F3 depends on the absence/presence of F3 in the base without any checking seems unfortunate. I think it'd be good if there wasn't any syntax that, through a misspelling, could go from override to not override.

dwblaikie avatar Apr 14 '25 20:04 dwblaikie

FWIW, I find this a bit undesirable - that the overriding/implementing behavior of F3 depends on the absence/presence of F3 in the base without any checking seems unfortunate. I think it'd be good if there wasn't any syntax that, through a misspelling, could go from override to not override.

I believe the intent is that virtual still means F3 is not present in bases of the class, as is the case with the current design.

The nuance of zygoloid's proposal is that if abstract impl is how you say "this is abstract, must be defined by the child, and (per impl) has a definition here" then virtual impl is the equivalent "this is virtual, may be defined by a child, and (per impl) has a definition here". But with virtual impl, the impl is kind of redundant so virtual is treated as shorthand.

[ed: also note impl fn (no virtual) remains valid per current design]

jonmeow avatar Apr 14 '25 22:04 jonmeow

One regression from this would be breaking the currently very simple relationship between abstract on a class and abstract on a method. Right now:

Only abstract classes may have abstract methods.

The proposed change would make this:

All virtual methods must be impl unless the class is abstract (abstract methods are about something else).

That's not part of the proposed change. That rule would remain unchanged.

That was not clear to me based on what was written. Perhaps you clarify the meaning of these keyword modifiers?

josh11b avatar Apr 14 '25 22:04 josh11b

This syntax makes it a bit to easy to call a specific class' definition of a function. I think this should be limited to something that can only be done from inside the class when it could be overridden by a derived class.

josh11b avatar Apr 14 '25 22:04 josh11b

The proposed change would make this:

All virtual methods must be impl unless the class is abstract (abstract methods are about something else).

That's not part of the proposed change. That rule would remain unchanged.

That was not clear to me based on what was written. Perhaps you clarify the meaning of these keyword modifiers?

I think I'd prefer to drop support for abstract impl entirely; as noted in my earlier comment we don't have a lot of motivation to support this, given that we can already make a class abstract without it having any abstract virtual functions.

  // New vtable slot. Also overrides A.F2.
  virtual impl fn F2();
  // Shorthand for previous.
  virtual fn F3();

FWIW, I find this a bit undesirable - that the overriding/implementing behavior of F3 depends on the absence/presence of F3 in the base without any checking seems unfortunate. I think it'd be good if there wasn't any syntax that, through a misspelling, could go from override to not override.

I think the whole mechanism of having multiple vtable slots with the same name but different signatures is proving a bit troublesome. If we remove that (at least until we have overloading in place) then we can perhaps have just virtual fn or abstract fn to introduce a new vtable slot and impl fn or final fn to fill in an existing one.

zygoloid avatar Apr 14 '25 23:04 zygoloid

FWIW, I find this a bit undesirable - that the overriding/implementing behavior of F3 depends on the absence/presence of F3 in the base without any checking seems unfortunate. I think it'd be good if there wasn't any syntax that, through a misspelling, could go from override to not override.

I think the whole mechanism of having multiple vtable slots with the same name but different signatures is proving a bit troublesome. If we remove that (at least until we have overloading in place) then we can perhaps have just virtual fn or abstract fn to introduce a new vtable slot and impl fn or final fn to fill in an existing one.

SGTM in that regard

dwblaikie avatar Apr 15 '25 20:04 dwblaikie