research icon indicating copy to clipboard operation
research copied to clipboard

Pick a strategy for deriving induction

Open cwgoes opened this issue 5 years ago • 16 comments

QTT does not natively support induction derivation; we need to add typing rules for it.

Some options are:

  • Add primitive datatypes (e.g. Nat) with elimination rules which can be used for induction This is what @thealmarty has implemented in LambdaPi.
  • Allow custom induction definitions a la the Calculus of Inductive Constructions. This is pretty similar to primitive datatypes; they go into more detail w.r.t. how to allow user-defined types to derive induction.
  • In line with Aaron Stump's work on Cedille, add a few type primitives (implicit product, dependent intersection, equality type), which allows induction to be derived for lambda-encoded naturals (and possibly other lambda-encoded datatypes) Here is the latest paper I know about on the topic. Here is the longer (earlier) paper on Cedille.

There may be other options. The first few chapters of the Cedille paper survey some prior research.

cwgoes avatar Jul 22 '19 15:07 cwgoes

...specifically, we might want the ability to derive induction for Mogensen-Scott-encoded datatypes.

Here is another paper on that encoding, and here is a note on typing them. Here is a typing-relevant discussion by Fu & Stump. I think Scott encodings have bad space efficiency for large recursive types (e.g. Nats, Lists); might want to use Boehm-Berarducci encoding in some cases.

Likely we will need some primitive types (w/induction) anyways; but the more we can do in lambda-encoded data the better I think (and maybe we can just optimize particular lambda-encoded patterns with types, e.g. natural numbers, to machine representations at a lower level).

cwgoes avatar Jul 22 '19 16:07 cwgoes

Here is a paper by Aaron Stump outlining Efficient Mendler-Style Lambda encodings in Cedile.

Stump also explains deriving induction for Mendler style lambda encodings in two Cedile cast videos which can be found here (p1) and here (p2)

This may be a useful representation, however I'd need to research more on it to compare it with boehm berducci encoding (typed church encoding) and Morgensen Scott encodings

mariari avatar Jul 23 '19 14:07 mariari

Comparison of various induction strategies

Church / Boehm Berducci

  • Is a catamorphism
  • Doing any pattern matching needs to be at least O(n)
  • For non Infinite data, termination is guaranteed
  • Typable in System F

Parigot encoding

  • Typeable in System F
  • positive recursive type
  • Pred is O(1)
  • Representation is exponential on data O(2^n)
    • so exponential on n where n is a natural number!

Stump-Fu encoding

  • improvement on Parigot encoding only taking up O(n^2) space

Mendler Style

  • Linear space O(n) for data
  • O(1) pred function
  • Termination of positive and negative inductive datatypes is proven given a few invariants
    • This negative inductive datatypes termination may not be true in the modification made by stump
    • The previous research[1] that had negative inductive datatypes being proven to terminate had O(n) Pred
  • Strong induction on negative inductive types is a problem
    • Open research question

Morgensen-Scott Encoding

  • Only works for Positive inductive types
  • O(1) Pred
  • Typeable in System F(?)
  • Termination can not be established (?) (see link above!)

[1] A Hierarchy of Mendler style Recursion Combinators

mariari avatar Jul 24 '19 17:07 mariari

Another note: we will need the ability to totality-check functions, otherwise the type system can be trivially bypassed - if we can identify particular encodings which we know will halt, though, that's fine.

cwgoes avatar Jul 25 '19 11:07 cwgoes

The continuation-based encoding proposed here is interesting.

cwgoes avatar Jul 25 '19 11:07 cwgoes

Are there particularly likely use-cases for negative inductive datatypes in practice?

cwgoes avatar Jul 25 '19 11:07 cwgoes

For linked lists or trees, we definitely need linear space, and the constant factor matters too.

Definite requirements

  • O(1) pattern matching (pred)
  • O(n) space for data (naturals, linked lists, trees) with low or optimizable constant factor
  • Termination checking on positive inductive datatypes
  • Induction on positive inductive datatypes (a few Aaron Stump type theory additions are OK)
  • Typeable in dependent type theory

Nice to have

  • Termination / induction on negative inductive datatypes
  • Typeable in simpler type theory

Given that, it seems like Mendler and Mogensen-Scott are the two most likely candidates, plus perhaps the continuation encoding if we can type it etc.

Of course, we can choose different encodings for different datatypes on a case-by-case basis as well.

cwgoes avatar Jul 25 '19 11:07 cwgoes

Discussion of intensional/extensional equality Coq does not (generally) have extensional equality Idris has extensional equality

It seems like we need to choose the specifics of equality, induction for specially-defined datatypes, and induction for lambda-encoded data in coordination.

Cedille (p12) has an equality type that satisfies both axiom J & axiom K - apparently this is incompatible with HoTT because HoTT needs to distinguish proofs of the same equality (I don't totally understand this). We should figure out how this might interact with OTT and if OTT's way of providing extensional equality has advantages that Cedille's lambda-encoded version does not.

Also, I think extensional equality is sometimes equivalent to doing beta-eta-reduction (as opposed to just beta-reduction) in the conversion rule / in the optimiser (or even at runtime), that's something to consider (ref, Morte does it), it would be nice if we could do that but we probably need to retain canonicity & decidable typechecking. (edit: not always equivalent)

Having extensional equality also seems really helpful for proofs.

One other constraint to consider is that we might want to have a single "datatype system" frontend which can turn into lambda-encoded datatypes or specially-defined datatypes depending on the backend and a compiler switch - ideally we'd avoid having specially-defined datatypes in the typechecker and instead just convert to lambda-encoded datatypes to typecheck but pass through the ADT structure to the backend or something.

For now, we could also easily just add built-in Eq / Refl for the time being (this release), and worry about extensional equality / induction on lambda-encoded data later.

cwgoes avatar Dec 14 '19 21:12 cwgoes

@andy-morris thinks eta-reduction of functions should be fine.

cwgoes avatar Dec 16 '19 16:12 cwgoes

Let's not worry about HoTT.

cwgoes avatar Dec 16 '19 16:12 cwgoes

We should investigate how Mendler & Mogensen-Scott encodings interact with QTT - namely, with 0-usage arguments erased, what do the actual runtime lambda-encoded terms look like and how (concretely) efficient are they?

cwgoes avatar Dec 26 '19 21:12 cwgoes

Course-of-values induction - induction by reasoning on all prior cases instead of just the immediate one - looks interesting.

Apparently Cedille can do this. There's a lot of boilerplate though.

Cedille's datatype overview is also interesting.

cwgoes avatar Dec 29 '19 23:12 cwgoes

Mendler encoding seems quite tenable if we can erase or cleverly compile away the constant space-overhead from the included recursor (the first argument to alg in \alg -> alg (\f -> f alg) (...), not sure if "recursor" is the right term or not).

cwgoes avatar Dec 30 '19 05:12 cwgoes

Another interesting paper is Inductive Types Deconstructed. It starts with the calculus of constructions and adds an equality type, labels, tuples, union types and recursive types. Then you can define datatypes using those basic constructs and you get induction for free.

atennapel avatar Jan 09 '20 09:01 atennapel

The paper Monotone recursive types and recursive data representations in Cedille shows recursion and induction principles derived for Scott-encoded data in Cedille. I don't quite understand it yet, but if this is the case then you wouldn't have the problem of the space-overhead of the included recursor of the Mendler encoding.

The paper Practical Subtyping for Curry-Style Languages shows a recursor for Scott-encoded natural numbers in an extension of System F.

Both of these are in a Curry-style system, I believe Juvix is a Church-style system?

atennapel avatar Feb 11 '20 14:02 atennapel