agda-stdlib icon indicating copy to clipboard operation
agda-stdlib copied to clipboard

Fragile `NonZero` and implementation-dependent luck

Open JacquesCarette opened this issue 10 months ago • 14 comments

A quite thorough description of the issue is already written so I won't repeat it all here.

Basically some of our NonZero instances resolve through sheer luck of the details of the current implementation and changes to make the upstream algorithm less ad hoc break stdlib. The issue boils down to some non-injective functions (like / and %) having accompanying cong functions with implicits that should be explicit as they can't in general be inferred [see details at above post.]. So we should fix those cong routines to not be so fragile.

This will be a breaking change, but one that we are forced to do, as our current implementation should be regarded as buggy because it relies on 'sheer luck' to work.

While I am intending to fix this myself, I'm not sure how fast I'll be, so I would be very happy to collaborate with others on that.

JacquesCarette avatar Mar 05 '25 19:03 JacquesCarette

Unfortunately, I didn't understand the discussion on the main Agda Zulip, but get your outline of what might be required. What's not clear is how many of these lemmas (all?) regarding the affected functions are touched by the change. is it 'simply' a case of the cong proofs being faulty (when proved, or when deployed, etc.), or all uses?

jamesmckinna avatar Mar 07 '25 08:03 jamesmckinna

@jamesmckinna The heart of the issue is that (without lossy-unification) Agda will not generally solve for _0 := y based on a constraint like f x _0 = f x y — practically, this means that if you have an argument that's only mentioned "in a return type, under a function" it'll often fail to be inferred— taking one of the affected functions as an example,

%-congˡ : ∀{m n o} .⦃ _ : NonZero o ⦄ → m ≡ n → m % o ≡ n % o

here the variables m and n appear as arguments to _≡_, which, as a data type, is definitionally injective; so they can be solved by looking at the type of the explicit argument to %-congˡ. This isn't unique to instances: lemmas like +-assoc generally need to have all of their arguments specified, because in general we can't definitionally invert addition. But here the argument o only appears under _%_, which is not injective, and so it'd go unsolved.

(It currently works by coincidence: at the use sites I inspected, the correct NonZero instance happens to be the first one tried, and (even though this is an unforced choice) Agda will simply choose that first instance because NonZero instances are always used irrelevantly.)

So, to summarize, the cong proofs have very "hopeful" types, which currently work by relying on details of instance search (that I would rather change); and there's not really a guarantee that the use sites will keep working. The easiest fix is to make any argument fitting that heuristic ("only under a function") explicit, which will cause a bit of churn.

plt-amy avatar Mar 07 '25 12:03 plt-amy

(P.S. I'd be happy to hold back on my instance search work/have it under a flag for a while/etc so these functions can have a proper deprecation cycle, since I'm not sure what the stdlib policy here is)

plt-amy avatar Mar 07 '25 12:03 plt-amy

Thanks @plt-amy ! for:

  • the more explicit guidance regarding how to localise the problem
  • the concern about how agda and agda-stdlib development cycles might interact

Regarding the second, I'm guessing your changes to instance inference/lossy unification will be v2.8.0, and since we're currently in a v2.7.0.* test/release cycle, I think these things could be considered 'non-interfering' (notwithstanding other changes to agda such as #2476 which have required changes to experimental rather than master). That said, maybe pushing to experimental would be the right thing to do? Shoutouts to @JacquesCarette @MatthewDaggitt and @gallais for rulings on this...

jamesmckinna avatar Mar 07 '25 13:03 jamesmckinna

Yes, getting rid of the unforced choices by 2.8.0 sounds like a good plan on the compiler end.

plt-amy avatar Mar 07 '25 14:03 plt-amy

Yes, I think these changes should be pushed to experimental with the aim of releasing them as compatibility changes with whatever release we make to coincide with v2.8.0.

MatthewDaggitt avatar Mar 11 '25 03:03 MatthewDaggitt

Did these changes make it into v2.8.0 @plt-amy and if so do we need to make some updates?

MatthewDaggitt avatar Jun 30 '25 06:06 MatthewDaggitt

Any progress on your end @JacquesCarette with this?

jamesmckinna avatar Jun 30 '25 11:06 jamesmckinna

Did these changes make it into v2.8.0 @plt-amy and if so do we need to make some updates?

No, nothing on this front. I could probably whip up a branch that somehow reorders the candidates to make finding the problematic cases in the stdlib easier, but given how fragile the rest of the Agda codebase (e.g. generalisation, with abstraction) seems to be against even minor changes to constraint solving order, my work on this front is currently buried under a pile of mindnumbing bugs.

plt-amy avatar Jun 30 '25 13:06 plt-amy

Alas, no progress. It is not hopeless (I have done a fair bit of coding in the last 3 weeks, just one other items. And they have progressed a lot, so I can foresee getting to this.)

JacquesCarette avatar Jun 30 '25 22:06 JacquesCarette

Okay will remove from v2.3 then.

MatthewDaggitt avatar Jul 02 '25 04:07 MatthewDaggitt

Is the way forward with this to raise a breaking/v3.0 PR which changes the type signature of the relevant lemmas to make the problematic arguments explicit? Or do we regard this as a v2.x bug which we can, and should fix, ahead of v3.0? And if so, should we do this before the associated changes to instance search are pushed upstream to agda, or after?

jamesmckinna avatar Oct 29 '25 16:10 jamesmckinna

Whether @MatthewDaggitt regards it as a bug is a good question. If he does, then that opens up multiple ways forward.

I do think stdlib should be fixed to be more robust and not rely on dumb luck of current implementation details.

JacquesCarette avatar Oct 29 '25 19:10 JacquesCarette

Personally my view is that this should be fixed in whichever comes first: the upstream change to Agda or v3.0 of the standard library. Making a breaking non-forced change in v2.4 seems like it would be a violation of our versioning policy?

MatthewDaggitt avatar Oct 31 '25 01:10 MatthewDaggitt