language icon indicating copy to clipboard operation
language copied to clipboard

Specify that constant expressions of type `Type` and constant type expressions are canonicalized

Open eernstg opened this issue 5 years ago • 5 comments

In response to #92, about the use of == vs. identical to detect that two instances of type Type reify the same type, we specify that Type instances that constantly reify a type are canonicalized.

Similarly, we may perhaps wish to specify that o.runtimeType (where runtimeType has no overriding declaration, that is, it invokes the implementation in Object) must return a canonicalized instance of Type in the case where o was created by an instance creation expression e (possible a constant object expression) as an instance of a non-generic class, or as an instance of a generic class where the type arguments passed in e (explicitly, inferred, or obtained via instantiate-to-bound) are constant type expressions. It is not obvious to me whether this will be easy to achieve in terms of implementation effort, or whether it will work well, in terms of performance implications for the language as a whole.

The next case seems more safe to require: We specify that whenever an actual type argument is a constant type expression, any evaluation of the corresponding type variable must return a canonicalized value. For instance, the following must print 'true' for every statement in main:

Type typeOf<X>() => X;

class C<X> {
  Type get typeArgument => X;
  D<X> dOf() => D<X>();
}

class D<Y> {
  Type get typeArgument => Y;
}

main() {
  print(identical(typeOf<List<int>>(), typeOf<List<int>>()));
  print(identical(C<List<int>>().typeArgument, C<List<int>>().typeArgument));
  print(identical(C<List<int>>().dOf().typeArgument, C<List<int>>().typeArgument));
}

Note that we do not require identical(C<List<int>>().dOf().runtimeType, typeOf<D<List<int>>>()), even though the reified type is in both cases D<List<int>>, which is a constant type expression. This is because the given instance was created by evaluating an instance creation expression where the actual type argument was not a constant type expression (it was X), so that type argument is not statically known to be canonicalized.

These rules would still not make the test using identical safe for any particular pair of instances of Type, because it requires developers to follow certain conventions in order to make it known statically that both of those instances were originally obtained by evaluation of a constant expression, and that's not a decidable property for Dart programs in general. But it seems to make the identical type test safe in a set of situations that is somewhat less intractable for developers who are willing to go the extra mile in order to get this optimization.

eernstg avatar Nov 12 '18 13:11 eernstg

I'd be happy with this. A type that occurs verbatim in the source code is canonicalized, even when passed through type variables.

I'm not sure it's viable.

It assumes some kind of connection between types and Type objects. If the type passed around through type arguments, and stored in generic objects, is implemented completely independently from Type objects, and it's only converted to a Type object when the T type variable is evaluated as an expression, then that expression has no way to distinguish one type from another. We'd need to add that information to the object as well.

Example:

class C<T> {
  Type get typeArgument => T;
  C<List<T>> wrap() => C<List<T>>();
}

main() {
  var c1 = C<List<Int>>();
  var c2 = C<int>().wrap();
  print(identical(c1.typeArgument, c2.typeArgument));
  var c3 = C<dynamic>();
  print(identical(c3.typeArgument, dynamic));
}

There is no reason to believe that an implementation represents the type bound to T in c1 differently from the one in c2. If you want c1 to provide a canonicalized Type object for List<int>, and c2 to not do so, then I think it is non-obvious how to do that. You will have to carry along more type information to know whether something must be canonicalized (probably the canonicalized Type object itself). That's an overhead when you don't reflect the type argument.

lrhn avatar Nov 16 '18 13:11 lrhn

Cf. discussion IRL with Lasse, it may be too expensive to do what I proposed (even though my proposal makes it OK for identical(c1.typeArgument, c2.typeArgument) to return false in this example), because my proposal implies that we must get the following:

class C<T> {
  Type get typeArgument => T;
  C<List<T>> wrap() => C<List<T>>();
}

typedef T2 = List<int>; // Using new, generalized typedef.

main() {
  print(identical(C<List<int>>().typeArgument, T2)); // Must print 'true'.
}

That's because the type argument List<int> given in the instance creation C<List<int>>() is a constant type expression, so it must be canonicalized (even though that instance was created at run time), and T2 is already a constant expression, and we agree that it is reasonable to add to the specification that constant instances of Type reifying a type must be canonicalized, and two canonicalized instances of type reifying List<int> must be identical.

So we may need to be very careful about the performance implications and the implementation effort needed when we decide on exactly how much canonicalization we wish to require.

Of course, it is always OK for an implementation to go further and canonicalize additional instances of Type reifying the same type, because they are never created using user-written code, so there is no way for a developer to prove that any such two Type instances are distinct. (Contrast this with identical(new Object(), new Object()) which must return false).

eernstg avatar Nov 16 '18 13:11 eernstg

An update on the status of this issue as of August 2022:

The null safety feature specification now specifies a run-time equality relation on reified types (implemented by operator ==). In short, we normalize the types and check syntactic equality (modulo different binding environments, e.g., prefix.C and C may be syntactically equal in this sense, but C and C are not syntactically equal when they resolve to different declarations).

Moreover, two constant type literals are identical if they are equal according to the above equality.

What remains to be resolved in this issue is canonicalization of reified types that are not constant expressions. In particular, typeOf<List<int>>() returns an object which is obtained by evaluating a type variable, and that is not a constant expression, and similarly for methods like typeArgument. We might choose to leave canonicalization for such reified types as an implementation specific property, but we might also come up with some rules where this space is specified for the language, or we may specify it partially by saying that certain situations must yield a canonical result.

Note that all examples in the comments above succeed today (the result is 'true' every time), which means that identical does hold for all these reified types from non-constant type literals, but we still need to consider to which extent canonicalization can occur. For instance, the following prints 'false' on the VM:

class E<X> {
  Type get listX => List<X>;
}

main() {
  print(identical(
    E<int>().listX,
    E<int>().listX,
  ));
}

eernstg avatar Aug 05 '22 08:08 eernstg

I'd be very wary about requiring canonicalization of run-time allocated values of any kind. That has a risk of memory leaks by keeping the canonical value alive for too long.

Imagine us saying that List<int> above should be canonicalized, then someone goes and puts an Expando on that object. Do we have to keep the object alive forever, just in case someone later creates a new List<int> and does an expando lookup on it? If the language semantics doesn't allow a GC to claim the unreferenced type object between those two situations, we have a memory leak. It can probably be written to allow the GC and still require that any two List<int> objects you actually have are identical. We promise canonicalization, but only among actually co-existing objects.

I'd rather not promise anything, and let implementations do what is more efficient. (Here too, I'd love if we threw in some randomness in debug/development mode, so people won't start depending on canonicalization where we don't actually promise it. We just need to break it often enough for bad tests to fail.)

lrhn avatar Aug 08 '22 09:08 lrhn

I'd be very wary about requiring canonicalization of run-time allocated values ...

Right, this kind of situation is exactly the reason why I said

We might choose to leave canonicalization for such reified types as an implementation specific property, but we might also come up with some rules where this space is specified for the language, or we may specify it partially by saying that certain situations must yield a canonical result.

You're basically arguing that canonicalization for reified types that are not obtained as the result of constant expression evaluation should be 100% implementation specific.

That's definitely a possibility, and it's easy to specify, and we already have an implementation. ;-)

I'm just noting that we could consider cases like this:

class C<X> {
  Type get theX => X;
}

Type typeOf<X>() => X;

typedef IntList = List<int>;

void main() {
  void check(Type t) => print(identical(IntList, t));
  check(List<int>); // True!
  check(C<List<int>>().theX);
  check(typeOf<List<int>>());
}

In these cases there is a compile-time constant type T (which happens to be List<int>) which occurs as a type argument, and we're then obtaining a reified type representing T by evaluating that type argument. This seems to yield a canonicalized result today.

So maybe there are some cases like this where we would actually want to decide that the result should be canonicalized, even though the reified type is not obtained by evaluating a constant expression, especially if this behavior is something that large programs (say, using AngularDart) rely on.

eernstg avatar Aug 08 '22 22:08 eernstg