subspace icon indicating copy to clipboard operation
subspace copied to clipboard

Downcast

Open danakj opened this issue 10 months ago • 9 comments

Box::downcast is built on an Any trait that requires compiler support and can't be reproduced directly in C++.

Downcasting correctly is a big part of memory safety/unsafety in C++ though.

Blink uses Downcast traits that you can specialize for your type to say if a particular object can be downcasted to a type.

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/wtf/casting.h;l=14-72;drc=34cecd5338a50388b013ed505f5d7508c40c95e5

// Helpers for downcasting in a class hierarchy.
//
//   IsA<T>(x): returns true if |x| can be safely downcast to T*. Usage of this
//       should not be common; if it is paired with a call to To<T>, consider
//       using DynamicTo<T> instead (see below). Note that this also returns
//       false if |x| is nullptr.
//
//   To<T>(x): unconditionally downcasts and returns |x| as a T*. CHECKs if the
//       downcast is unsafe. Use when IsA<T>(x) is known to be true due to
//       external invariants and not on a performance sensitive path.
//       If |x| is nullptr, returns nullptr.
//
//   DynamicTo<T>(x): downcasts and returns |x| as a T* iff IsA<T>(x) is true,
//       and nullptr otherwise. This is useful for combining a conditional
//       branch on IsA<T>(x) and an invocation of To<T>(x), e.g.:
//           if (IsA<DerivedClass>(x))
//             To<DerivedClass>(x)->...
//       can be written:
//           if (auto* derived = DynamicTo<DerivedClass>(x))
//             derived->...;
//
//   UnsafeTo<T>(x): unconditionally downcasts and returns |x| as a T*. DCHECKs
//       if the downcast is unsafe. Use when IsA<T>(x) is known to be true due
//       to external invariants. Prefer To<T> over this method, but this is ok
//       to use in performance sensitive code. If |x| is nullptr, returns
//       nullptr.
//
// Marking downcasts as safe is done by specializing the DowncastTraits
// template:
//
// template <>
// struct DowncastTraits<DerivedClass> {
//   static bool AllowFrom(const BaseClass& b) {
//     return b.IsDerivedClass();
//   }
//   static bool AllowFrom(const AnotherBaseClass& b) {
//     return b.type() == AnotherBaseClass::kDerivedClassTag;
//   }
// };
//
// int main() {
//   BaseClass* base = CreateDerived();
//   AnotherBaseClass* another_base = CreateDerived();
//   UnrelatedClass* unrelated = CreateUnrelated();
//
//   std::cout << std::boolalpha;
//   std::cout << IsA<Derived>(base) << '\n';          // prints true
//   std::cout << IsA<Derived>(another_base) << '\n';  // prints true
//   std::cout << IsA<Derived>(unrelated) << '\n';     // prints false
// }
template <typename T>
struct DowncastTraits {
  template <typename U>
  static bool AllowFrom(const U&) {
    static_assert(sizeof(U) == 0, "no downcast traits specialization for T");
    NOTREACHED();
    return false;
  }
};

And then has IsA (querying), To (panicking), DynamicTo (fallible) and UnsafeTo (unchecked) functions to do the downcast. This is a lot like llvm's llvm::isa, llvm::cast and llvm::dyn_cast.

The DowncastTraits are very very much like how we're now implementing most concepts in subspace. If we were to add a concept CanBeDowncasted which tests for the presence of the downcast traits, it would be basically the exact same, but restricting to a generic "downcastable" type isn't that interseting.

A better concept would be CanBeDowncastTo<From, To> which checks both for the presence of DowncastTraits and that To is a subclass of From, which you can then write:

void f(CanBeDowncastTo<Frog> auto* frog_supertypes) {}

Which would accept anything Frog derives from. This is not a super common paradigm to express today. Code works with higher level concepts and then downcasts to the specific stuff dynamically, rather than wanting something generic that can already be known to be maybe that subtype. This generic concept would be useful in the case of multiple inheritance and wanting any of those super types, I'm not sure I can give a good example. When you have a concrete super type instead, you already know it "can be downcasted" except that we would express that through the (very unsafe) static_cast.

So what the concept then provides is a compile time check for the downcast traits on the call signature, but the compiler error when you try to use them would be sufficient, unless you want to work with generics as in "some superclass in a multi-inheritance tree".

tl;dr I don't see the reason to add a concept for Any that just checks if a type can downcast to something right now, but I would love to come up with an example that would benefit.

What I see in Rust is this used for type erasure instead of inheritance. Since you can't upcast to a super class, you convert to an Any trait. Inheritance is stronger in that it puts into the type system a restriction on which things you can try downcast to, as it has to be a subtype. But then C++ jumps directly to Undefined Behaviour by making the cast infallible.

Most likely the right answer is to reproduce Blinks DowncastTraits verbatim, I don't know of any issue with them, and continue to lean on inheritance rather than generics for expressing a super type (see also sus::error::DynError).

So then coming back to Box::downcast, which is present for Box<T: Any> and calls through to Any::isa/Any::downcast_unchecked. We could provide a Box::downcast that is present when DowncastTraits are present. What's the point?

The point is that for unique_ptr to downcast you have to do make_unique_ptr<SubType>(static_cast<Subtype*>(uniq.release())) which is bad. Leaving the unique_ptr can drop the deleter, it can make static analysis much harder, and it necessitates the API to provide release() and construction from a native pointer, and for developers to be used to them.

So Box<SuperType>::downcast() -> Box<SubType> is worth having, and that it can be fallible and chain with Result/Option/Iterator is good too. And compile-failure inside downcast() on a missing DowncastTraits is not great. So we're back to the concept being a good idea. Not because users will write generic code on the concept, but because Container APIs may want to provide fallible downcast behaviour that is conditionally available.

danakj avatar Aug 23 '23 19:08 danakj

Blink's traits work with pointers, but returning a null pointer is bad times (null pointers are bad). Receiving pointers is also a pain, when you're not going to store that pointer. Rvalues can't be chained through them properly.

So I think we redo them to receive T& and return T& and Option<T&>. We can use [[clang::lifetimebound]] to catch misuse of returned temporaries rather than blocking rvalues (we're not returning references to things owned inside a container, which is where we systemically block returning a reference from an rvalue).

danakj avatar Aug 23 '23 20:08 danakj

Slack thread on issues with the Blink impl from @zetafunction: https://chromium.slack.com/archives/CGHNFGN9L/p1607720784245200

And this summary:

I think I was complaining about how the way DowncastTraits is specified means that you can only have canonical traits in one place. So if someone else wants to allow down casts you need another escape hatch. Which Blink has.

danakj avatar Aug 23 '23 22:08 danakj

The issue raised by @zetafunction is that the trait structs are templated only on the To type, so there can only be one of those, and all From methods must be inside it.

The benefit to this is that the methods can accept any pointer that is convertible to the From pointer type that the trait is written for, as the type can be deduced from the function call. But the downside is one place that writes them all, which is quite a down side. If a library has a class Foo and it's meant for subclassing, then it's a hard requirement that it be possible to downcast from Foo to your class, but if the library defines downcast traits in the Blink way, you can't.

This is solvable by allowing the From template parameter to be moved up onto the class, but template it to allow your From or anything derived from From which then has the same effect of allow subclasses to share one base class trait.

https://godbolt.org/z/dsoGYv4sx

#include <concepts>
#include <utility>

template <class Derived, class Base>
concept IsDerivedFrom = std::is_base_of_v<Base, Derived>;

template <class From, class To>
struct DowncastImpl {};

template <class From, class To>
concept Downcast = requires(const From& c, From& m, From&& r) {
    requires IsDerivedFrom<To, From>;
    { DowncastImpl<From, To>::downcast(c) } -> std::same_as<const To&>;
    { DowncastImpl<From, To>::downcast(m) } -> std::same_as<To&>;
    { DowncastImpl<From, To>::downcast(std::move(r)) } -> std::same_as<To&&>;
};

struct First {};
struct Second : public First {};
struct Third : public Second {};

template <IsDerivedFrom<First> From>
struct DowncastImpl<From, Second> {
    constexpr static const Second& downcast(const From& from) noexcept {
        return static_cast<const Second&>(from);
    }
    constexpr static Second& downcast(From& from) noexcept {
        return static_cast<Second&>(from);
    }
    constexpr static Second&& downcast(From&& from) noexcept {
        return static_cast<Second&&>(from);
    }
};

template <IsDerivedFrom<First> From>
struct DowncastImpl<From, Third> {
    constexpr static const Third& downcast(const From& from) noexcept {
        return static_cast<const Third&>(from);
    }
    constexpr static Third& downcast(From& from) noexcept {
        return static_cast<Third&>(from);
    }
    constexpr static Third&& downcast(From&& from) noexcept {
        return static_cast<Third&&>(from);
    }
};

static_assert(Downcast<First, Third>);
static_assert(Downcast<First, Second>);
static_assert(Downcast<Second, Third>);

template <class To, class From>
    requires(!std::is_reference_v<To>)
decltype(auto) downcast(From&& from) noexcept {
    return DowncastImpl<std::remove_cvref_t<From>, To>::downcast(std::forward<From>(from));
}

int main() {
    Third third;
    Second& second = second;
    Third& thirdagain = downcast<Third>(second);
}

danakj avatar Aug 24 '23 03:08 danakj

Backing up a sec, it's the To that is fixed with the Blink approach so really they should be defined with the To type anyway. You don't need to add more traits to someone else's type. Either they have them or they don't have the traits, but you can't add more base classes to some other type...

danakj avatar Aug 24 '23 03:08 danakj

What if Downcast could convert to things that are not actually a subclass?

https://doc.rust-lang.org/stable/std/error/trait.Error.html#method.downcast

This downcast is from 'dyn Error' to T. T is not a subclass of Error there, it implements Error. So this is not an inheritance downcast, this is a concept-to-type conversion.

For Box<DynError> in sus we would also like to be able to pull out the concrete error type and "downcast" to Box<MyError> except that is incorrect. It would have to be to Box<DynErrorTyped<MyError>> which could then be .into_inner().into_inner() to pull out the MyError directly...

What we could do for a Box<DynError> is "downcast" to a MyError& instead of a Box, with a .try_as<MyError>() type of transformation...? What kind of concept would represent "Can be viewed as a concrete type T" instead of "is also a pointer to a T" (which is downcast)?

Downview?

static_assert(Downview<DynError, MyError>);
auto Box<DynError>::downview<MyError>() -> Result<const MyError&, DownViewError>

How would that concept be implemented? For DynError it means a downcast to DynErrorTyped<MyError> and then an as_inner() or something.

struct DownviewImpl<DynError> {
  template <class T>
  static Result<const T&, DownViewError> downview(const DynError& e) {
    if (!e.is<T>) { // You can't write this though.
      return bad...;
    } else {
      return static_cast<const DynErrorTyped&>(e);  // This should use `downcast()`
    }
  }
};

How can DynError implement is? It needs a pure virtual method that the subclass implements. But it has to tell the subclass what type it's looking for, which requires a template method, which virtual methods can't be.

Or put differently, how can DynError implement Downcast to DynErrorTyped<T>?

template <class T>
struct DowncastImpl<DynErrorTyped<T>> {
  bool isa(const DynError& e) {
    ????
  }
};

So it seems the idea of a general type erased DynError is much more lossy in C++ than in Rust, unless you use RTTI I guess.

A codebase could provide their own base class for all errors that knows how to downcast. Right now Box<DynError>::from(Error) is very specific and would not work for that. But that's because DynError is outside the class hierarchy.

If you use your own error types that can downcast, then Box<GeneralError>::from(SubError) works if the from() method retains the caller type instead of slicing it, and stores it as a heap object of GeneralError.

danakj avatar Aug 24 '23 20:08 danakj

I should note all these weird things go away if errors just have to subclass DynError. Then downcast is normal.

But then you can't use an enum as an error, you have to always wrap things in structs. This one is hard tradeoffs to map into C++.

danakj avatar Aug 24 '23 21:08 danakj

ok thoughts over here https://github.com/chromium/subspace/issues/324#issuecomment-1692515511 but we will just not implement Downcast anything for DynError it is truly up to the application layer (or whatever layer) to provide strongly typed things with inheritance if it wants to allow downcasting. DynError is the harshest of type erasures and you can't get back from it.

danakj avatar Aug 24 '23 22:08 danakj

Jonathan Müller gave an interesting suggestion for replicating Any::type_id in C++, which is to use __PRETTY_FUNCTION__ in a templated function to turn each instantiation into an id. You can use the address of the string, but then they will all show up in your binary, which maybe you don't want. You can also use a hash of the string, but then you can get collisions with no way to identify it's happened. Alternatively you can use a global variable but then it's not constexpr compatible.

His godbolt example using the hash of __PRETTY_FUNCTION__: https://godbolt.org/z/W16PssohP My example of using a static (non-constexpr) global variable: https://godbolt.org/z/sn66759bh And an example of using the address of __PRETTY_FUNCTION__: https://godbolt.org/z/rW7qeTEv5

This would enable downcasting to any type without writing type traits, as long as __PRETTY_FUNCTION__ is present in all compilers. Restricting its use to subclasses before instantiating the template would avoid bringing in the __PRETTY_FUNCTION__ string for types that aren't subclasses which does limit the number of strings pretty substantially in case of generic code trying to see if it can downcast to stuff in weird ways.

I wonder how much string binary size would be introduced in a codebase like Chromium with this.

danakj avatar Aug 25 '23 20:08 danakj

A couple issues with using __PRETTY_FUNCTION__:

  • In an anonymous namespace the names could be ambiguous.
  • It's not quite like Any::type_id which gives you the type id of the concrete type, it will give you the type id of the base class instead. So you need a virtual type_id() function that all subclasses inherit and implement, and which leaks the id.
  • With &dyn Trait in rust you have a type erasure but Any::type_id still gives you the right answer, not "the Trait" which is equivalent to what C++ gives you with "the base class".

If there's a virtual function involved there's no need for constexpr anymore. So the static variable approach could work except that across dylibs you get collisions, possibly for subclasses of the same base, since the virtual function has to return the id, and the comparison can happen in a different dylib. The virtual function can't be templated on the other type to generate both ids in the same dylib.

danakj avatar Aug 25 '23 20:08 danakj