agda-stdlib icon indicating copy to clipboard operation
agda-stdlib copied to clipboard

`Data.Vec.Base.splitAt`, `take` and `drop` green slime

Open JacquesCarette opened this issue 2 years ago • 16 comments

The "improved" implementation of PR #2056 still is flawed. For ease of comparison, it is

splitAt : ∀ m {n} (xs : Vec A (m + n)) →
          ∃₂ λ (ys : Vec A m) (zs : Vec A n) → xs ≡ ys ++ zs
splitAt zero    xs                = ([] , xs , refl)
splitAt (suc m) (x ∷ xs)          with splitAt m xs
splitAt (suc m) (x ∷ .(ys ++ zs)) | (ys , zs , refl) =
  ((x ∷ ys) , zs , refl)

But this says: if you give me (splitAt) a Vec xs that you built in such a way that you know that its length is of the form m + n, then I can split it into two pieces which will concatenate to the original, with the pieces being of the 'right' length.

It's pretty much sure that the vector that you want to split wasn't actually built that way! To me, a better interface (and code) should be:

splitAt : ∀ m {n o} → (o≡m+n : o ≡ m + n) → (xs : Vec A o) →
          ∃₂ λ (ys : Vec A m) (zs : Vec A n) → cast o≡m+n xs ≡ ys ++ zs
splitAt zero    p xs                = ([] , cast p xs , refl)
splitAt (suc m) p (x ∷ xs)          with splitAt m (suc-injective p) xs
splitAt (suc m) p (x ∷ xs) | (ys , zs , p′) = ((x ∷ ys) , zs , cong (x ∷_) p′)

But that is quite a large interface change. It does get rid of the green slime. [And I'm annoyed there's a need for cast but that might be inevitable.]

JacquesCarette avatar Aug 13 '23 22:08 JacquesCarette

It might be more consistent with elsewhere to express the equalities the other way around, so we transform the more complicated expression into the simpler one (and expect the equality that way around for the addition equality)

Taneb avatar Aug 14 '23 08:08 Taneb

I don't think it's appropriate to call it 'green slime': it's a view so its whole purpose is to invert a computation.

The proposed changes give you back a casted vector and so you will need additional manipulations using a cast-refl lemma to 'unappend' a vector. Additionally that new behaviour can already be obtained by passing the casted vector to the original splitAt so I think this is a strict regression.

gallais avatar Aug 14 '23 09:08 gallais

@Taneb that was my instinct as well, but then all the cast needed a sym which was a clear sign to me that things were the wrong way around! There's an explanation too: splitAt is not a simplification, it actually increases information, so it makes sense to me that the equality is oriented from simple to complicated.

@gallais of course I'm not calling splitAt green slime, I'm calling m + n in the type of Vec A (m + n) green slime. The point is that with the original type, you're asking the creator of the vector to have arranged that it already be of that syntactic form, i.e. that the length of xs be of the form m + n, which sure doesn't seem like a 'common' thing, unless of course your vector was created by a concat -- at which time it would be quite silly to split it up again!

Now, you will surely have also noticed that while I submitted the previous as a PR, this is an issue. I knew this one would need non-trivial discussion and ran a real risk of rejection (that cast bothers me a lot, even though the rest seems like an improvement).

In the end, what you say at the end is probably the "route forward": in common use, calls to splitAt should likely be wrapped in a cast call. The main difficulty: how do we document that that is the intended way to use it?

JacquesCarette avatar Aug 14 '23 20:08 JacquesCarette

I don't think it's necessarily true that we arrived there by using concat.

We could for instance be working with a vector of size m just after having learned that m is shaped n + p by case analysis on an indexed family (cf. don't click if you're a POPL PC member this splitAt).

I see the current splitAt basically as both a less fancy (nat instead of list) and a more fancy (comes with an equality proof) version of

++⁻ : ∀ xs {ys} → All P (xs ++ ys) → All P xs × All P ys

And I don't think we should change all of the f⁻ property functions to prevent f from showing up in the premises.

gallais avatar Aug 14 '23 22:08 gallais

My ignorant question: why do we want to return the proof that the resulting vecs append to create the original one in the function? Wouldn't it be more normal to have that as a property in Data.Vec.Properties?

Taneb avatar Aug 15 '23 05:08 Taneb

Nothing ignorant about such a question at all!

Indeed, I'm ignorant as to the particular history of splitAt, so I can only offer a (highly-biased!) personal perspective, which may indeed only be a rational reconstruction after the fact... but here goes:

A 2001/2004-era McBride&McKinna (--with-K!!!) view of lists/vectors of length m + n, as an interactive problem specification for the eventual implementation of a function splitAt, the idea being that the function should somehow "invert the _++_ operation", might have looked something like this:

data View {m} {n} : Vec A (m + n) → Set where
  split : (xs : Vec A m) → (ys : Vec A n) → View (xs ++ ys)

with splitAt now the function which targets that View (indeed: is the covering function view for this view). NB values of such a non-uniform inductive type family 'know' their decomposition, so splitAt would be 'correct-by-construction' (the implicit equation would be a consequence of well-typedness, and the type provides a cast-iron spec. of what the function should do, so typical view covering functions 'write themselves'; here the question is "what additional data do I need to make a deterministic/unambiguous decomposition?" with the answer given by "specify how much/a length of the vector you wish to consume to produce the prefix xs, with ys necessarily then being what's left", and then noticing that such data is already present in the index information m + n in the (type of the) function itself; as @JacquesCarette points out, the decomposition of any particular vector/list length as m + n is a choice, but as @gallais points out, such a choice may already have been made for us by ambient contextual considerations... but you can perhaps also see the specification of monus as a degenerate instance of this: "if I can see any particular n as p + q for naturals p and q, for some specified p, then I can compute n - p as ... q" (etc.)

Time goes by... greater awareness of/sensitivity towards equality... observational type theory... greater awareness of/sensitivity towards 'green slime'... --without-K... indices vs. parameters in inductive families (cf. Epigram vs. Cayenne)... and such a thing would at the very least probably have been transposed (this is morally yet another instance of the 'fording' transformation) to make the implicit equation/unification constraint explicit, and so end up looking like

data View {m} {n} (zs : Vec A (m + n)) : Set where
  split : (xs : Vec A m) → (ys : Vec A n) → zs ≡ xs ++ ys → View zs

Now, the view covering function is (basically; modulo the cong (x ∷_) reasoning that @JacquesCarette has (re-)introduced, which for the other View would have been... managed definitionally by the typechecker ;-) NB. I may be responsible for the strictification of the defn of splitAt, with the uses of refl reflecting (sic) such definitional force, but I am getting better at paying attention to making definitions less strict these days) the same as before (and morally the same as splitAt as we now know it), but pattern matching on results of type View zs now returns both the xs , ys decomposition, and the equation correlating them with zs (an inevitable consequence of 'fording').

The last design/refinement/refactoring step (in the case of splitAt, initLast etc.) I can only conjecture might then be to say: ... more time passes... languages and their type theories get more refined in their analyses of such situations... so to hell with an explicit view, and the only thing you can do with it being to (irrefutably!) match against its only constructor... and in any case one-constructor datatypes should really be understood as records, with eta-expansion automatic... and so now we don't even need a record per se, an iterated Σ-type will do just as well, and ... here we are.

As to "more normal", that's a particular choice in the current stdlib design, and as elsewhere, I would still, 20+ years on, wish to argue for view-based programming as an important part of "why dependent types matter" in programming... but, again as ever, YMMV (and, to date, has so varied;-)).

jamesmckinna avatar Aug 15 '23 06:08 jamesmckinna

TL;DR: if you want to invert a function, and you want that inverse operation to be 'correct-by-definition', then (implicitly or explicitly) there's an equation to prove (cf LeftInverse)... and you can:

  • either prove it yourself: the Base/Properties 20th century way of doing things, a la HOL, Isabelle, Coq...
  • or, get the typechecker to prove it for you 'for free', in the course of writing the inverse function via a View/view

We tried to argue in 2004 (successfully or otherwise) that the latter approach was the 21st century way to go about things (and was, in fact, already known in the 20th century, but not effectively operationalised until ... 'proper' implementations of dependent type theory started to come along).

For a recent instantiation of the ideas, see #1923 and the discussion of use of the view there to invert inject₁ and thereby (re-)implement lower₁ as its inverse...

jamesmckinna avatar Aug 15 '23 07:08 jamesmckinna

Ob: directions of equations, and the ambient stdlib design heuristic complicated ≡ simple(r) TL;DR @JacquesCarette is correct

I think it's important to pay attention to the mode of the rewrite rule, and in particular, when the complicated term is output from a pattern-match/with analysis, and the simple is a variable, bound to the value of an input expression, then the heuristic really should be variable ≡ complicated, since the instantiation of variable is being unfolded (explained) by the analysis, so all occurrences of it become replaceable by the new complicated expression, susceptible to subsequent further analysis... so in the classical "Inverting Inductive Definitions", "Elimination With a Motive" and "View from the Left" papers, the equation was always (I think? I hope!) oriented "var ≡ exp`.

(That also correlates with 'fording' being a special instance, for the equality relation, of right Kan extension of a predicate along a relation... for those with the correct spectacles on. )

jamesmckinna avatar Aug 15 '23 09:08 jamesmckinna

As a last thought on this for the time being: 'green slime' is not always easy to remove. Eg, in #2056 and the downstream issue #2090, any attempt to remove green slime from the type

lookup-take : (xs : Vec A (m + n)) (i : Fin m) (m≤m+n : m ≤ m + n) →
              lookup (take m xs) i ≡ lookup xs (Fin.inject≤ i m≤m+n)

of lookup-take is going to incur a cast penalty, in order to ensure that the xs would have the correct type for take to be applicable. It looks as though it might be doable with an inlined pattern-matching let in the type, but... cognitive/visual burden...

Correspondingly, if the solution @JacquesCarette seeks is to remove green slime from the type of splitAt, take, drop, then I don't (at present!) see a way to achieve this without correspondingly incurring a cast penalty in the types of those functions in hypothetical 'slime-free' versions. Or am I missing something?

jamesmckinna avatar Sep 18 '23 07:09 jamesmckinna

@JacquesCarette are you (still) keen to keep this issue open? Regarding eg opening a PR to document what you see as the intended mode d'emploi?

jamesmckinna avatar Jan 11 '24 11:01 jamesmckinna

Yes, I still would like to keep this open. I will want to come back to it, when I have a bit of time to breathe. Stupid admin job.

JacquesCarette avatar Jan 12 '24 22:01 JacquesCarette