KEEP icon indicating copy to clipboard operation
KEEP copied to clipboard

Subtyping Reconstruction

Open e2e4b6b7 opened this issue 11 months ago • 12 comments

This issue is for discussion of the proposal to introduce Subtyping Reconstruction into the Kotlin language. The full proposal text is available here: proposals/subtyping-reconstruction.md

The KEEP presents the Subtyping Reconstruction technique that introduces smart casts for generics.

Please, use this issue for the discussion on the substance of the proposal. For minor corrections to the text, please open comment directly in the PR: #410.

e2e4b6b7 avatar Feb 04 '25 09:02 e2e4b6b7

Just clarifying, this would also be fine, right?

fun <T> evalWhen(e: Expr<T>): T = when (e) {
    is IntLit -> 42
}

kyay10 avatar Feb 05 '25 17:02 kyay10

From "Can we fix these problems":

Fundamentally, this means we may need to add ways to encode recursive types (aka µ types) or some other ways to propagate such complex (e.g., recursive) constraints in the type system.

We leave the solutions to these problems for possible future work.

Would introducing Shape interfaces do the trick here? Just food for thought

kyay10 avatar Feb 05 '25 17:02 kyay10

@kyay10,

Just clarifying, this would also be fine, right?

fun <T> evalWhen(e: Expr<T>): T = when (e) {
    is IntLit -> 42
}

Yes, it works in the same way as with any other Int-typed expression on the rhs. More precisely, in this case it is: T :> Int :> ILT and everything is fine.

Would introducing Shape interfaces do the trick here?

Yes, inferring only material interfaces will simplify everything, but there are some other issues. Therefore, we decided to address them separately to keep things manageable.

e2e4b6b7 avatar Feb 06 '25 09:02 e2e4b6b7

I'm concerned about the approach to supporting this feature. The constraint resolution algorithm seems to have a lot of corner cases. For example, when resolving a constraint on a type variable, it goes through all the bounds on the type variable, but then it seems it can eventually generate a new bound on that same variable, which seems to create a non-terminating loop (or a loop that gets cut off arbitrarily). Even more worryingly, it seems hard to ensure the newly generated bounds conform to the invariants we can use to make typing efficient. For example, if every type variable has at most one upper bound, we can take advantage of this to keep subtyping and inference polynomial-time. But the proposed approach seems to enable adding arbitrary upper bounds to type variables, making these problems exponential-time. (I know some of these problems currently arise with smart casts, but there might be ways to fix this, whereas the proposed approach seems to make them unfixable.)

This feature seems to be expected to be used by advanced users with low frequency. Given that, would it make sense to look into decidable approaches that require a little more guidance/annotation from the user? For example, here's my stab at adapting the syntax for a known decidable approach:

when (variable of SuperClass<T>): WhenType {
   is SubClass<U> -> expr
   ...
}

Here we add this of SuperClass<T> clause, which declares type variables T to represent variable's type argument for SuperClass. The WhenType then uses T to describe the type of the when expression, e.g. List<T>. In the case is SubClass<U>, the U is yet another type variable, this time representing the value's type argument for SubClass (if it belongs to SubClass). Suppose we have SubClass<X> : SuperClass<Set<X>>; then in this case the T in WhenType gets replaced by Set<U>, essentially incorporating the knowledge we gain about T from the fact that variable belongs to SubClass. So, if WhenType were List<T>, then expr would need to have type List<Set<U>> assuming variable has type SubClass<U>. We do this check for each case. (When the U would not be used in a case, the user can elide them.) Afterwards, if know that variable has type SuperClass<A>, and if WhenType were List<T>, then the when expression would have type List<A>.

In other words, the WhenType annotation tells us how to translate between the (possibly conflicting) information ensured by the type of variable and the invariants guaranteed by the various subclasses of SuperClass. (I'm shoving details with variance and such under the rug.)

That's probably not very clear, so here is how it looks applied to various examples in the KEEP:

private val caller: Caller<M> = when (oldCaller of Caller<T>): Caller<T> {
    is CallerImpl.Method.BoundStatic -> { // Here T is substituted with ReflectMethod
        val receiverType = (descriptor.extensionReceiverParameter ?: descriptor.dispatchReceiverParameter)?.type
        if (receiverType != null && receiverType.needsMfvcFlattening()) {
            val unboxMethods = getMfvcUnboxMethods(receiverType.asSimpleType())!!
            val boundReceiverComponents = unboxMethods.map { it.invoke(oldCaller.boundReceiver) }.toTypedArray()
            CallerImpl.Method.BoundStaticMultiFieldValueClass(oldCaller.member, boundReceiverComponents)
        } else {
            oldCaller
        }
    }
    else -> oldCaller
}
sealed interface Expr<out T>
class IntLit(val i: Int) : Expr<Int>

fun <T> evalWhen(e: Expr<T>): T = when (e of Expr<X>): X {
    is IntLit -> e.i
}
sealed interface EqT<A, B> {
  class Evidence<X> : EqT<X, X>
}

class Data<S, D> {
  val value: D = ...
  fun coerceJsonToString(ev: EqT<D, JsonNode>): Data<S, String> =
    Data(when (ev of EqT<A,B>): (A) -> B { is Evidence -> { it } } (this.value))
}

// The above pattern of converting a value is pretty common, so here's a shorthand syntax for it:
class Data<S, D> {
  val value: D = ...
  fun coerceJsonToString(ev: EqT<D, JsonNode>): Data<S, String> =
    Data(when (ev of EqT<A,B>): (val x: A = this.value) -> B { is Evidence -> x })
}
sealed interface Chart<A> {
    fun draw(chartData: A)
}
class PieChart : Chart<PieData>
class XYChart : Chart<XYData>
fun modifyPieData(PieData): PieData {...}

fun <A> Chart<A>.customDraw(chartData: A): Unit =
  when (this of Chart<X>): (chartData: X) -> Unit {
    is PieChart -> {
      draw(modifyPieData(chartData))
    }
    else -> draw(chartData)
  }
interface A<in T, in V>
interface A1<in V> : A<Int, V>
interface A2<in T> : A<T, Int>

fun f(v: A1<*>): A2<Int> =
    when (v of A<T,V>): A2<T> {
        is A2 -> v
        else -> throw Exception()
    }
}

(I believe this is known as indexed pattern matching, albeit adapted to subtyping and inheritance rather than cases of an inductive data type.)

RossTate avatar Feb 11 '25 16:02 RossTate

@RossTate would this also basically allow existential types in subexpressions? What I mean is, Kotlin seems to have recently gained the ability to do something like:

data class Foo<T>(val list: MutableList<T>, val set: MutableSet<T>)
fun <T> bar(list: MutableList<T>, set: MutableSet<T>) { }
fun Any.baz() = when (this) {
  is Foo<*> -> bar(list, set)
  else -> {}
}

Where internally the type T of this@baz is the type argument to bar. Weirdly, the very similar

fun baz(foo: Any) = when (foo) {
  is Foo<*> -> bar(foo.list, foo.set)
  else -> {}
}

Does not compile. It sounds like your proposed idea would cover this usecase nicely too by allowing:

fun baz(foo: Any) = when (foo) {
  is Foo<T> -> bar<T>(foo.list, foo.set)
  else -> {}
}

Where T is a fresh type parameter (and perhaps even the type argument to bar would be unnecessary and inferred automatically) Is that correct? Or would this be limited to only superclass-subclass situations?

kyay10 avatar Feb 11 '25 17:02 kyay10

@kyay10 What you're touching upon is type capturing. There's a lot I could go into, but I think I'll just focus on the specific question you asked: yes, what I described would support "opening" the existential type. All of the following would type-check:

fun baz1(foo: Any) = when (foo of Any) {
  is Foo -> bar(foo.list, foo.set)
  else -> {}
}

and

fun baz2(foo: Foo<*>) = when (foo of Foo<T>) {
  is Foo -> bar(foo.list, foo.set)
}

and

fun baz3(foo: Foo<*>) = when (foo of Foo<T>) {
  is Foo<U> -> bar<U>(foo.list, foo.set)
}

They might be overkill for that particular example, but hopefully you can see how they can generalize to more complex examples.

RossTate avatar Feb 11 '25 18:02 RossTate

Apologies for the tangential conversation @RossTate. Would something like

fun baz4(foo: Any) = when (foo of Any) {
  is Foo<U> -> bar<U>(foo.list, foo.set)
  else -> {}
}

also be supported? Or would it have to instead be the mouthful of:

fun baz4(foo: Any) = when (foo) {
  is Foo<*> -> when(foo of Foo<T>) { is Foo -> bar<T>(foo.list, foo.set) }
  else -> {}
}

If the automatic approach in the KEEP cannot be made polynomial (maybe there's a way to modify the algorithm?), then I would like this more-explicit syntax, especially because it allows that explicit type-capturing. In fact, even if the automatic approach works, I like the explicit syntax anyways since it clarifies what's going on for the reader.

kyay10 avatar Feb 11 '25 18:02 kyay10

Yes, your first (more concise) version of baz4 would work too.

RossTate avatar Feb 11 '25 18:02 RossTate

Here's a proof that subtyping reconstruction makes validation NP-Hard. In particular, given a graph, the following shows how to generate a program that is valid if and only if the graph is 3-colorable.

class Red {}
class Green {}
class Blue {}

class Vertex<V>() { public var vertex: V? = null }

sealed class GADT<out E> {}
class RedGreenGADT : GADT<Pair<Vertex<Red>,Vertex<Green>>> {}
class RedBlueGADT : GADT<Pair<Vertex<Red>,Vertex<Blue>>> {}
class GreenRedGADT : GADT<Pair<Vertex<Green>,Vertex<Red>>> {}
class GreenBlueGADT : GADT<Pair<Vertex<Green>,Vertex<Blue>>> {}
class BlueRedGADT : GADT<Pair<Vertex<Blue>,Vertex<Red>>> {}
class BlueGreenGADT : GADT<Pair<Vertex<Blue>,Vertex<Green>>> {}
// could have more cases...

fun <E> colorGraph(gadts: Array<GADT<E>>, addEdge: (E) -> Unit) {
    if (gadts[0] !is RedGreenGADT) return // After this, we know E :> Pair<Vertex<Red>,Vertex<Green>>
    if (gadts[1] !is RedBlueGADT) return // After this, we know E :> Pair<Vertex<Red>,Vertex<Blue>>
    if (gadts[2] !is GreenRedGADT) return // After this, we know E :> Pair<Vertex<Green>,Vertex<Red>>
    if (gadts[3] !is GreenBlueGADT) return // After this, we know E :> Pair<Vertex<Green>,Vertex<Blue>>
    if (gadts[4] !is BlueRedGADT) return // After this, we know E :> Pair<Vertex<Blue>,Vertex<Red>>
    if (gadts[5] !is BlueGreenGADT) return // After this, we know E :> Pair<Vertex<Blue>,Vertex<Green>>
    // Now I have 6 lower bounds on E
    // Note that I could also have achieved this with 6 different parameters of type GADT<E>, rather than a single array
    
    // For each vertex vi of the graph:
    val vi = Vertex() // introduces an unknown type denoting the color of vertex vi

    // For each edge vi -> vj of the graph:
    addEdge(Pair(vi, vj)) // due to the lower bounds on E, this is valid iff vi and vj are vertexes of different colors
}

RossTate avatar Feb 24 '25 17:02 RossTate

We had internal discussions about the NP-hardness introduced by subtyping reconstruction. We have an idea for overcoming these issues by tracking the consistency of the introduced bounds. For example, in the previous case, something like E :> Pair<Vertex<*>, Vertex<*>> will be inferred instead of six separate lower bounds. This idea requires further elaboration and verification, so it will be published in a few weeks.

e2e4b6b7 avatar Mar 03 '25 10:03 e2e4b6b7

Documenting this proof of undecidability for a subset of the features in consideration as they are currently written (which is not to say they cannot be adapted), which I came up with by adapting an idea from Garrigue and Le Normand's GADTs and Exhaustiveness: Looking for the Impossible.

sealed class Symbol {}
class Symbol_blank: Symbol() {}
// For each symbol s:
class Symbol_s: Symbol() {}

sealed class Tape {}
class Nil: Tape() {}
class Cons<out Head: Symbol, out Tail: Tape>: Tape() {}

class State {}
// For each state s_i:
class State_i: State() {}

interface StepsTo<out Configuration> {
    fun next(): Configuration
}

sealed class Configuration<out Current: State, out Head: Symbol, out Left: Tape, out Right: Tape> {}
// For each "state s_i moves left and transitions to s_j when head is Input, replacing it with Output":
class Transition_i_j_Input<out Left: Tape, out Next: Symbol, out Right: Tape>
        : Configuration<State_i, Input, Cons<Next, Left>, Right>(),
          StepsTo<Configuration<State_j, Next, Left, Cons<Output, Right>>
// plus, in the case that Input is blank:
class Transition_i_j_nil<out Right: Tape>
        : Configuration<State_i, Symbol_blank, Nil, Right>(),
          StepsTo<Configuration<State_j, Symbol_blank, Nil, Cons<Output, Right>>
// For each "state s_i moves right and transitions to s_j when head is Input, replacing it with Output":
class Transition_i_j_Input<out Left: Tape, out Next: Symbol, out Right: Tape>
        : Configuration<State_i, Input, Left, Cons<Next,Right>>(),
          StepsTo<Configuration<State_j, Next, Cons<Output, Left>, Right>
// plus, in the case that Input is blank:
class Transition_i_j_nil<out Left: Tape>
        : Configuration<State_i, Symbol_blank, Left, Nil>(),
          StepsTo<Configuration<State_j, Symbol_blank, Cons<Output, Left>, Nil>
// For each "state s_i halts when head is Input":
class Halt_i<out Left: Tape, out Right: Tape> : Configuration<State_i, Input, Left, Right>()

// Suppose s_0 is the initial state
// The following is valid if and only if the Turing machine does not terminate
fun terminates(val initial: Configuration<State_0, Symbol_blank, Nil, Nil>) {
    var configuration = initial
    when (configuration) {
        is StepsTo -> configuration = configuration.next()
    }
}

In particular, Configuration does not itself implement StepsTo. However, for given "constant" type arguments Current, Head, Left, and Right, there is necessarily a single subclass of Configuration<Current,Head,Left,Right> that does implement StepsTo if there is a corresponding transition in the Turing machine. In which case, it has a next() method, the return-type of which is the configuration the Turing machine would step to. So the when is exhaustive if and only if the Turing machine does not terminate on the empty input.

Note that the hierarchy I used is not recursive.

RossTate avatar Mar 05 '25 20:03 RossTate

Documenting more thoughts for future reference. (Note that I'm approaching this from the perspective that, although name resolution and method overloading are necessarily exponential if done "completely", it seems we can design things so that everything else is doable "completely" in polynomial time.)

When thinking about constraints, it seems like the constraints that are most critical to safety or those that arise from the following pattern:

fun foo(x: InputType): OutputType {
    if (x is TestName1) {
        if (x is TestName2) {
            ... when (x) {
                   is CaseName1 -> throw
                   is CaseName2 -> throw
                   ...
                   else -> return x
        } else throw
    } else throw
}

While we often think of subtyping constraints as having the form "single-type <: single-type", the circumstance above is best directly described using a constraint of the form "list-of-types <: list-of-types". If we have meets (e.g. intersections) and joins (e.g. unions), we can capture such a constraint using the traditional subtyping constraint "meet of list-of-types <: join of list-of-types", but even then it turns out the much of the reasoning we want to do is better done in the "list-of-types <: list-of-types" form of constraints.

Looking at my foo example, the constraint looks like "InputType, TestName1<,...>, TestName2<,...>, ... <: CaseName1<,...>, CaseName2<,...>, ..., OutputType". Notice that the constraint falls into a pattern: only one of the types in either list is an arbitrary type; the rest are of the form Name<*,...>. One option we have is to ensure all constraints are of this form (or of slightly relaxed forms where, e.g., there can be multiple OutputTypes provided they are disjoint). A benefit of this option is that these constraints have a familiar set-theoretic interpretation: "an object belonging to all of the left-hand types necessarily belongs to (at least) one of the right-hand types". In the absence of features like method overloading and extension methods, we can also ensure method resolution can be done in polynomial time. Of course, Kotlin has method overloading and extension methods, so maybe that isn't much of an advantage here. A downside of this option is that there are certain features that conflict with this invariant. For example, the straightforward approach to downcasting would combine the bounds on type arguments ensured by the subclass with the bounds written explicitly by the user for the superclass even though these combinations would break from this form of constraint.

Another conflicting feature is subtype reconstruction. With subtyping reconstruction, we can place an arbitrary number of arbitrary bounds on type parameters. @e2e4b6b7's suggestion of consistency reconstruction is effectively extending constraint-reasoning with the rules necessary to process arbitrary constraints of the form "list-of-types <: list-of-types" in polynomial time. Such extensions, for example, make nominal types distributive. This is not necessarily a bad thing, systems like MLstruct have shown that distributive nominal types can be safe, but it does behave a little oddly (e.g. it no longer admits the obvious set-theoretic model), so we have to be careful how it interacts with other features of Kotlin (MLstruct is an academic calculus, and so is missing a number of critical features). If we can make it work, it does a great job of reasoning in the presence of even constraints that were not designed to be compatible with each other, which has a lot of uses. But first we have to make sure it's safe for Kotlin's (potential) features.

One potentially conflicting feature is, let's say, "type-argument-dependent exhaustiveness checking". This is one of the features in the proposal, wherein the type argument of some nominal type is checked to satisfy some additional properties in order to the cases of a when exhaustively cover all the inhabitable cases of that nominal type. My proof above shows that, if unrestricted, this features makes typing undecidable. But that doesn't mean we can't design a restricted form that is decidable (in polynomial time). For example, consider something like the following:

sealed class Values<out T>
        of IntValues when T: Int
        of NullValues when T: Nothing? {}
class IntValues : Values<Int>() {}
class NullValues : Values<Nothing?>() {}
class MixedValues : Values<Int?>() {}

Here we add of clauses to the GADT-like Values class. We can think of the first clause as saying "a Values<T> is exhaustively covered by IntValues when T is a subtype of Int". I won't go into the details, but there are restrictions we can impose that rule out the design in my undecidability proof. (More specifically, users can write that class hierarchy, but not in a way such that type-checker will consider Configuration to be conditionally exhaustively covered by StepsTo, so the terminates function will always be rejected.)

We can take these of clauses a step further and say they mean "Values<T> is a subtype of IntValues when T is a subtype of Int". On the surface, this seems to work fine in the absence of subtype reconstruction (I'll get to below-the-surface issues later). But when we allow subtype reconstruction, we encounter some problems. For example, suppose we take a type parameter A and use subtype reconstruction to add lower bounds IntValues <: A and NullValues <: A. Then what do we do with a constraint of the form Values<T> <: A? This constraint seems to hold only if either T is a subtype of Int or T is a subtype of Nothing?; it doesn't seem to be enough for T to be a subtype of Int? because that would include MixedValues. However, that's using the set-theoretic interpretation. But we could alternatively apply @e2e4b6b7's justification for consistency reconstruction: any currently expressible supertype of both IntValues and NullValues is necessarily a supertype of both Values<Int> and Values<Nothing?>. By this reasoning, we can reduce the constraint Values<T> <: IntValues, NullValues to the constraint T <: Int, Nothing?. This behaves well algorithmically, but is it safe? After all, it means that MixedValues is a subtype of A despite not being a subtype of any one lower bound of A.

This reasoning also means that the constraint MixedValues <: IntValues, NullValues holds. This seems concerning, because it means that the type-checker would accept when (MixedValues()) { is IntValues -> ...; is NullValues -> ...} as exhaustive, even though it clearly isn't! So at first consistency reconstruction seems to make the type system unsafe. But, there's another reason this when is problematic. The Values class does not have an of saying when it is exhaustively covered by just those two cases. So we should reject whens with just those two cases anyways because we do not have a good way to reason about them. That, then, avoids this unsafety issue (though we have to be careful with elses in whens). It would still be possible to allow users to specify when subsets of cases are covered; the point is that the user needs to specify them, and the requirements of those specifications seem to effectively avoid the unsafety issues.

That addresses exhaustiveness. One option is that of clauses are only for exhaustiveness. That is, Values<Int> would not a subtype of IntValues (but we could provide a convenient shorthand like values of IntValues that would be equivalent to when (values) { is IntValues -> values }). Another option is to extend of to subtyping. Issues arise when the subclasses/subinterfaces start to inherit interfaces not inherited by the common GADT base, or start to provide methods not provided by the common GADT base. The former is easier to address, but the latter gets pretty messy (though maybe the same thing we use to address name resolution and method overloading can address this).

Those are my thoughts from the last week. Sorry that they're probably overwhelming. I just needed to have them written out somewhere so that I didn't lose track of them. At present, I could still see going either way on all of the design choices I mentioned here.

RossTate avatar Mar 08 '25 18:03 RossTate

Any info about when this'll be available as experimental? I have some advanced use cases I'd like to try with it (in particular, an interesting encoding of Higher-Kinded Types)

kyay10 avatar Sep 20 '25 20:09 kyay10

What's the plan with contract support? Will things "just work" out the box?

kyay10 avatar Sep 21 '25 21:09 kyay10

Does this feature also help when there isn't direct "subtyping" involved, but instead there's evidence given that a type parameter should be equal to some other type? Let me explain with an example:

interface TypeWitness<T> // note the invariance

fun <T> TypeWitness<T>.assumeInt() {
  contract {
    returns() implies (this@assumeInt is TypeWitness<Int>) //using -Xallow-contracts-on-more-functions
}

fun <T> TypeWitness<T>.foo(x: T) {
  assumeInt() // or alternatively, if you dislike the contract abuse, this as TypeWitness<Int>
  val y: Int = x // Does this compile??? I think it should
}

If not, will this version of it work then?

interface TypeWitness<T> // note the invariance
object IntWitness: TypeWitness<Int>
fun <T> TypeWitness<T>.assumeInt() {
  contract {
    returns() implies (this@assumeInt is IntWitness)
}

fun <T> TypeWitness<T>.foo(x: T) {
  assumeInt() // or alternatively this as IntWitness, or even a `if (this !is IntWitness) return` if necessary
  val y: Int = x // Does this compile??? I think it should
}

Being able to force type equality like this is very useful for advanced use cases and to tell the compiler about assumptions we have made directly

kyay10 avatar Sep 21 '25 21:09 kyay10

How does this deal with existential types? Let me demonstrate with an example:

class Inv<T>
interface Expr<T>
class InvExpr<T>(val t: T): Expr<Inv<T>>

fun <T> Expr<T>.compose(other: Expr<T>): Expr<T> {
  if (this !is Inv<*> || other !is Inv<*>) TODO()
  val x = if(flipACoin()) t else other.t
  return InvExpr(x)
}

Is it able to understand that T == Inv<A> for some A, and hence x: A, thus the returned value is InvExpr<A> <: Expr<Inv<A>>, and thus the return is valid? Perhaps the explicit type opening mentioned earlier would allow this? Something along the lines of:

fun <T> Expr<T>.compose(other: Expr<T>): Expr<T> = when (this of Expr<T>) {
  is InvExpr<A> -> when(other of Expr<T>) {
    is InvExpr<A> -> {
      val x: A = if(flipACoin()) t else other.t
      InvExpr<A>(x)
    }
  }
  else -> ...
}

You can imagine extending this example to PairExpr with 2 type parameters. Would the compiler be smart enough to understand this? Currently, there's no way to get this to compile properly without doing unsafe casts.

kyay10 avatar Oct 01 '25 22:10 kyay10