lean4 icon indicating copy to clipboard operation
lean4 copied to clipboard

Exponential slowness in unification when there are metavariables

Open JovanGerb opened this issue 3 months ago • 0 comments

Prerequisites

Please put an X between the brackets as you perform the following steps:

  • [x] Check that your issue is not already filed: https://github.com/leanprover/lean4/issues
  • [x] Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to Mathlib or Batteries.
  • [x] Test your test case against the latest nightly release, for example on https://live.lean-lang.org/#project=lean-nightly (You can also use the settings there to switch to “Lean nightly”)

Description

The unification algorithm has 2 caches, the permanent cache and the transient cache. The transient cache is reset regularly. It is used when there are metavariables. Unfortunately, the transient cache is reset too often, which causes unnecessary exponential slowness in unification in the presence of metavariables. In mathlib, this can cause deterministic time-outs and general slowness during type class search. This especially happens in unification problems involving a lot of projections, for example when unifying two instances of Mul α, where one instance depends some metavariable instance of a class high in the hierarchy, such as Field α or CommRing α.

The problem arises for unification goals of the form f a₁ .. aₙ =?= f b₁ .. bₙ which contain at least one metavariable. There are two main approaches to solve this:

  1. Unify the arguments pairwise: a₁ =?= b₁, .. aₙ =?= bₙ.
  2. Unfold f on both sides

It is important to start with (1), e.g. for goals of the form f a =?= f ?_, so that we can immediately assign ?_ := a.

When applying (2), we often end up with many of the same arguments a₁ .. aₙ and b₁ .. bₙ on both sides. This means that applying (1) to that leads to many of the same unification goals a₁ =?= b₁, .. aₙ =?= bₙ. The problem is that the transient cache has been reset in the meantime, meaning that these unifications are tried again from scratch. If this pattern repeats, then it leads to an exponential blowup.

Context

Many complaints about slowness/time-outs on Zulip are caused by this issue.

#mathlib4 > Coercion triggers timeout #mathlib4 > simp timeout at `whnf` #mathlib4 > Coercion instance problems with matrix exponential

Steps to Reproduce

To mimick a large type class hierarchy, we can define a class A : Nat → Type, together with an instance [A n] : A (n + 1). Hence the following MWE:

class A (n : Nat) where
  x : Nat

instance {n} [A n] : A (n+1) where
  x := A.x n

abbrev N := 14

def test [A 0] : Fin (A.x N) := sorry

-- time spent is proportional to `2 ^ N`
set_option trace.profiler true in
example [A 1] : Fin (A.x N) := test

Expected behavior:

the example fails instantly.

Actual behavior:

the example fails after around 3 seconds, and this time is proportional to 2 ^ N.

Versions

Lean 4.25.0-nightly-2025-09-16

Additional Information

I have a fix for this in #8883, which shows that fixing this gives a 3.8% speedup in mathlib.

Impact

Add :+1: to issues you consider important. If others are impacted by this issue, please ask them to add :+1: to it.

JovanGerb avatar Sep 16 '25 15:09 JovanGerb