ghc-tcplugin-api icon indicating copy to clipboard operation
ghc-tcplugin-api copied to clipboard

What does splitTyConApp_upTo do? Documentation is unclear.

Open phadej opened this issue 4 months ago • 10 comments

I understand what splitTyConApp_maybe :: HasDebugCallStack => Type -> Maybe (TyCon, [Type]) does. It's a reverse of mkTyConApp :: TyCon -> [Type] -> Type.

I don't really understand what splitTyConApp_upTo does. There wasn't reverse for previous -> Maybe (... (TyCon, [Type]) (probably because we need use givens?), nor current

splitTyConApp_upTo :: TyConSubst -> Type -> Maybe (NonEmpty (TyCon, [Type], [Coercion]))

Like splitTyConApp_maybe, but taking Given constraints into account.

Alongside the TyCon and its arguments, also returns a list of coercions that embody the Givens that we depended on.

So I ask for an example of what splitTyConApp_upTo does (especailly when splitTyConApp_maybe would fail)`, and maybe even for an morallly an inverse function to make it even more obvious.


I'm working on updating a plugin using splitTyConApp_upTo, and I cannot judge what I should do with coercions, if anything.

phadej avatar Aug 26 '25 17:08 phadej

In type-checking plugins, it's common to want to rewrite type family applications. For the sake of illustration, let's say we want to implement the rewrite F (G x) ~> x, where F, G :: Type -> Type are type families with arity 1.

The naive thing to do in a type-checking plugin would be:

rewriteFG :: Type -> Maybe Type
rewriteFG ty
  | (tc1, [arg]) <- splitTyConApp_maybe ty
  , tc1 == tyCon_F
  , (tc2, [x]) <- splitTyConApp_maybe arg
  , tc2 == tyCon_G
  = Just x
  | otherwise
  = Nothing

Then, to solve a constraint such as [W] F (G x) ~ x, the type-checking plugin would proceed as follows:

  1. Rewrite both sides, to get a new [W] x ~ x. Either solve that directly, or re-emit it to be solved by GHC.
  2. Solve the old Wanted [W] F (G x) ~ x by unsafe-coercing the evidence term of the new Wanted, using a UnivCo coercion.

The problem then is: what if the type-checker plugin instead sees the following constraints:

[G] co :: G x ~ y
[W] F y ~ x

Here, we essentially have a let-bound skolem type variable y := G x. The above code for rewriteFG will return Nothing on the application F y, because the inner argument is a bare type variable, which is not of the form G x. But the inner argument can be made to be of the form G x using the Given:

rewriteFG2 :: TyConSubst -> Type -> Maybe (Type, [Coercion])
rewriteFG2 givensTyConSubst ty
  | (tc1, [arg], cos1) <- splitTyConApp_upTo givensTyConSubst ty
  , tc1 == tyCon_F
  , (tc2, [x], cos2) <- splitTyConApp_upTo givensTyConSubst arg
  , tc1 == tyCon_G
  = Just (x, cos1 ++ cos2)
  | otherwise
  = Nothing

Now rewriteFG2, applied to F y, will return Just (x, [co1]). The returned list of coercions is a list of which Givens we used.

To solve the Wanted now, we proceed as before:

  1. Rewrite both sides to get a new Wanted that we can pass to GHC (the plugin has made progress).
  2. Solve the old Wanted in terms of the new Wanted by using a UnivCo coercion.

Now, (2) is where the [Coercion] comes in: the old Wanted is equal to the new Wanted only in the context of [G] co :: G x ~ y. For example we might have something like:

foo :: Num x => F y -> Maybe (G x :~: y) -> x
foo a pf = case pf of
  Just Refl -> a
  Nothing -> 0

After typechecking, we expect to get Core of the form:

foo :: Num x => F y -> Maybe (G x :~: y) -> x
foo @x @y $dNum a pf = case pf of
  Just (Refl (co :: G x ~# y)) -> a |> UnivCo (PluginProv str) Nominal (F y) x [co]
  Nothing -> fromInteger @x $dNum 0

The inner coercion, UnivCo (PluginProv str) Nominal (F y) x [co], mentions co that is bound in the pattern match; thus it cannot be floated out past it. If, in the plugin, we had instead returned a UnivCo with an empty list of coercions, then we could float this coercion out, essentially yielding a proof of forall x y. F y ~ x, which is invalid. This can interact with typeclass specialisation to create bogus specialisations.

sheaf avatar Aug 26 '25 18:08 sheaf

Please don't bury documentation in a comment to an issue.

phadej avatar Aug 26 '25 18:08 phadej

Please don't bury documentation in a comment to an issue.

Right, I was just in the middle of writing a question to ask what you think I should add to the documentation. I thought it would be more productive to ask what you think would be helpful, rather than me unilaterally adding an explanation that is not actually enlightening.

sheaf avatar Aug 26 '25 18:08 sheaf

Let me try to understand this, so I could tell what information I was missing (and would like to see in the docs in the future as the bare minimum):

  1. mkTyConApp is morally an inverse of splitTyConApp_upTo, but
  2. ty and mkTyConApp tycon args (where Just (tycon, args, cos) = splitTyConApp_upTo subst ty) are not equal on the nose,
  3. so to convert a value a :: $ty to :: $(mkTyConApp tycon args) users should use a |> UnivCo (PluginProv ...) Nominal $ty $(mkTyConApp tycon args) $cos, as just a won't be correct. As far as I see from your example, that would already be wrong if -dcore-lint is enabled.

The UnivCo is a handy simplification, but in theory we could construct a safe coercion in some cases. The API doesn't allow for that (yet?), as it tells only what constraints are needed, but not where. Nevertheless 0.17.0.0 is an impovement, as now we can at least record which constraints were (potentially) relevant, where previously we would need to record all constraints in givens, which is way more pessimistic. (And in practice, we often are too optimistic and record none, which is bad and wrong).


In context of my current work (on large-anon), the above is already helpful; but not really enough.

large-anon (as many plugins probably) creates type-class instances from a thin air.

If we have

class HasField name record a where
   fieldIndex :: Tagged (name, record, a) Int

we may have a local context with givens which allow us to deduce what the magic Int value is, i.e create an instance of the fly.

But at the end GHC will just see an Int and quite happily float it too far up.

The plugin API for solving constraint is:

TcPluginOk :: [(EvTerm, Ct)] -> [Ct] -> TcPluginSolveResult	

The plugin has not found any contradictions, The first field is for constraints that were solved. The second field contains new work, that should be processed by the constraint solver

If I know that I use equality constraints from givens is there a way to use those in EvTerm which we give back to GHC to record the dependency? As far as I can tell, there isn't. If I do a reflexive dummy coerce Int ~ Int, even with UnivCo, GHC will happily just simplify that to Refl. Or can we do better?

... EDIT: as it looks like while it would be possible to thread and record all constraints which were relevant, there is no way to tell that to GHC back, so in that case it's "ok" to simply ignore them.

phadej avatar Aug 26 '25 19:08 phadej

Indeed, it can be a bit tricky to thread through coercions in order to write an evidence term for typeclass constraints; it's not as easy as for equality constraints.

For a simpler example, suppose we have [G] co :: n + 1 ~ 4, [W] KnownNat n. We can produce a proof of KnownNat 3 that is valid universally, and then use a UnivCo mentioning co to coerce it to a proof of KnownNat n. So the final proof term would be (Kn 3 :: KnownNat 3) |> UnivCo (PluginProv str) Nominal (KnownNat 3) (KnownNat n) [co].

I imagine it would be the same for the HasField example, e.g. [G] co :: Merge '[ "fld1" := Int ] '[ "fld2" := Bool ] ~# r3, [W] RowHasField "fld2" r3 Bool. You would produce evidence for RowHasField "fld2" '[ "fld1" := Int, "fld2" := Bool ] Bool, then coerce it using a UnivCo that mentions co.

sheaf avatar Aug 26 '25 19:08 sheaf

For a simpler example,

Or rather GHC would force us to do it in opposite order, doesn't it?

  1. When asked for [G] co :: n + 1 ~ 4, [W] KnownNat n
  2. We produce (kn3 :: KnownNat 3) |> UnivCo (PluginProv str) Nominal (KnownNat 3) (KnownNat n) [co] with new wanted kn3 :: KnownNat 3.
  3. and if GHC is not smart enough to solve KnownNat 3 itself,
  4. we would solve it.

GHC Plugin API doesn't say whether we can introduce new wanteds and solve them in one iteration of constraint solver. For me it feels safer to do in separate steps, even if that would potentially require duplicating some work.

phadej avatar Aug 26 '25 20:08 phadej

Or rather GHC would force us to do it in opposite order, doesn't it?

Yes, that's right.

GHC Plugin API doesn't say whether we can introduce new wanteds and solve them in one iteration of constraint solver.

I would have to test that. I imagine it's OK but I'm not 100% sure.

sheaf avatar Aug 26 '25 20:08 sheaf

For the record, the docs should also mention that UnivCo can record dependendy coercions only with GHC-9.12, but luckily the library provides mkPluginUnivCo etc.

Re-exporting mkUnivCo (which doesn't take this additional info) on older GHC and does on newer is probably bad idea.

In fact, I'd say that re-exrporting anything from GHC is a bad idea as GHC API is not stable. Re-exports are convenient, but if people have to depend on ghc anyway to restrict its versions (or get access to CPP macros), the convenience is really minimal. But that is a different issue.

phadej avatar Aug 26 '25 20:08 phadej

I would have to test that. I imagine it's OK but I'm not 100% sure.

That would be great. Then we can give better advice in the docs.

I'll process this over night and try to write something concrete down (i.e. a documentation PR)

phadej avatar Aug 26 '25 20:08 phadej

I would have to test that. I imagine it's OK but I'm not 100% sure.

That would be great. Then we can give better advice in the docs.

I'll process this over night and try to write something concrete down (i.e. a documentation PR)

Here is what I had written so far. It doesn't include anything about splitTyConApp_upTo being a kind of inverse of mkTyConApp, which I agree is a useful perspective.

In type-checker plugins, we often want to rewrite type family applications, e.g.
we might want to rewrite @F (G x)@ to @x@, where @F@ and @G@ are both type families.
(We need a type-checking plugin to do this, because GHC rejects type family
applications inside the LHS of a type family application.)

Suppose, for example, that our typechecker plugin comes across the Wanted
constraint @[W] w1 := F (G t1) ~ t2@, for some types @t1@ and @t2@. We would
usually handle this as follows:

  1. Discover that the LHS type @F (G t1)@ is of the form @F (G ...)@,
     using two successive calls to 'splitTyConApp_maybe'. This means we
     can rewrite the LHS to @t1@ as per the rewrite rule @F (G x) ~> x@
  2. Create a new, rewritten Wanted constraint @[W] w2 := t1 ~ t2@.
  3. Solve the old Wanted constraint w1 in terms of the new Wanted w2,
     by coercing the evidence term of @w2@ using 'mkPluginUnivCo'.
  4. Pass @w2@ back to GHC as a new Wanted.

However, what if we instead had both a Given @[G] co := G t1 ~ y@ and a Wanted
@[W] w1 := F y ~ t2@? In this case, when splitting apart @y@ with
'splitTyConApp_maybe', we would fail to take into account that @y ~ G t1@, so
we would fail to rewrite @F y@ to @t1@. This is where 'splitTyConApp_upTo'
comes in:

  1. Use 'mkTyConSubst' to create a 'TyConSubst' for all the Given constraints
     passed to the typechecker plugin.
  2. Discover that the LHS type @F y@ is of the form @F (G ...)@ "up to Givens"
     using 'splitTyConApp_upTo'. This will additionally return the coercion
     @co :: G x ~# y@ which we used to rewrite @y@ into @G t1@.
  3. Create the new rewritten Wanted constraint @[W] w2 := t1 ~ t2@.
  4. Solve @w1@ in terms of @w2@ by coercing the evidence term of @w2@ using
     'mkPluginUnivCo', **supplying the coercions returned by (2)**.
  5. Pass @w2@ back to GHC as a new Wanted.

It is important to thread through the coercions returned by 'splitTyConApp_upTo',
even though this can be somewhat cumbersome. In the example above, this corresponds
to the fact that the rewrite @F y ~> t1@ is __only valid under the
assumption that @G t1 ~ y@__. If we fail to thread through these coercions,
GHC will behave as if we unconditionally proved that @F y@ can be rewritten to
@t1@ in all circumstances, which can lead to incorrect runtime results in
conjunction with typeclass specialisation (see for example https://gitlab.haskell.org/ghc/ghc/-/issues/23923#note_543934).
-}

-- | Like 'splitTyConApp_maybe', but takes Given constraints into account.
--
-- Looks through type synonyms, just like 'splitTyConApp_maybe' does.
--
-- Examples:
--
--    - No Givens: behaves the same as 'splitTyConApp_maybe'.
--      The @NonEmpty@ will always have exactly one element, and the
--      @[Coercion]@ will always be the empty list.
--    - @[G] co1 :: a + b ~ x, [G] co2 :: 2 * a ~ x @.
--      Calling 'splitTyConApp_upTo' on @x@ with these Givens will return
--      @Just ((typeNatAddTyCon, [a,b], [co1]) :| [(typeNatMulTyCon, [2,a], [co2])])@.
--      This means that the input type @x@ is a 'TyConApp' in two different ways
--      when taking into account the Givens.
--
-- __Do not discard the coercions returned by this function!__ These constitute
-- the context in which a rewrite is valid, and passing them on when constructing
-- proof terms in the plugin (using e.g. 'mkPluginUnivCo', 'mkPluginUnivEvTerm'
-- or 'mkTyFamAppReduction') ensures these proofs do not get floated out past
-- the enclosing Givens (which could cause GHC to perform invalid optimisations,
-- potentially resulting in incorrect runtime results as in
-- https://gitlab.haskell.org/ghc/ghc/-/issues/23923#note_543934).

sheaf avatar Aug 26 '25 21:08 sheaf