Should we add Self types?
The Scott encoding is an important tool that is ultimately necessary for many efficient algorithms on the untyped lambda calculus. It is impossible to encode Scott encoded values on Morte. Things such as zipWith and min are quadratic. Self Types enable the Scott Encoding, and linear zipWith/min/similar functions without compromising strongly normalization or soundness. The extension is so simple it could be added in Morte as it is as a few lines of code.
I could submit a pull request if Gabriel wishes. Should Self types be added?
So I would prefer not to make changes to morte's core calculus if at all possible, even for performance reasons. It's really important to me to keep morte's implementation and semantics as simple as possible so that morte can be easily ported to multiple platforms.
I think a better approach to this would be to define a separate language and compiler that can support self types for better optimization before desugaring to morte.
Also, as I mentioned on reddit, usually if you want zipWith or min I'd recommend the coinductive encoding of Nat/List before modifying the core language.
Gabriel, could you point to an example of streams being used for the purpose you mention? I always understood streams as generators, i.e., functions that generate an infinite list by repetitive applications of it to a seed value. But I'm not aware of using streams to encode specific, finite lists. How exactly would you do it, and take its tail in O(1)?
Also, if that is possible, wouldn't you be able to just convert a church-encoded nat to a stream, take its pred, convert back to nat, and then you'd get linear min (which people on that SO question claimed to be impossible)?
Moreover, I understand your concerns with making the language more complex, but if it is a very small complexity which makes it as efficient as Haskell, isn't it worth? If simplicity was the only deal we'd have simple types, no? I think there's a balance to make between expressibility and simplicity. Morte + Self looks like a great local maximum.
Finally I'd just like to thank you for this project. Even though the content is above my current knowledge, I can read most of your code there. It is what helped me understanding dependent types the most. Thanks.
I'll use Haskell code to illustrate the concept first and then convert it to morte code.
data CoList a = forall s . MkCoList s (s -> Maybe (a, s))
-- This `tail` is O(1)
tail :: CoList a -> Maybe (CoList a)
tail (MkCoList seed step) = do
(_, seed') <- step seed
return (MkCoList seed' step)
The encoding of the CoList type in morte (assuming you already have an encoding for Maybe) is:
\(a : *)
-> forall (CoList : *)
-> forall (MkCoList : forall (s : *) -> s -> (a -> Maybe (a, s)) -> CoList)
-> CoList
Regarding min, you can convert the inductive encoding of Nats to a coinductive encoding but that conversion process is O(N^2) so it would be as inefficient as performing min directly on the inductive encoding. Also, there's no way to perform the reverse transformation (coinductive to inductive), because the coinductive encoding permits encoding infinity, while the inductive encoding does not.
I do eventually want to make a more efficient language that interops with morte and annah might end up being that language, where I do optimizations directly in annah independent of morte or it might be another language. However, the most important problem that morte is trying to solve is to essentially be the first "JSON for code" so it has to be as simple and easy to reimplement in other languages as possible so that it can be used as a portable code interchange format.
Ah, I see. The only point I don't get of all that is how can you be able to optimize a "min" on the source language. When you compile it to morte, it will be morte, so you'll have to convert whatever format you use on the source language to the church encoding, which will be O(n^2) for min. I can't visualize how you're able to optimize min away from the source lang if you have to store it on morte anyway, which can't handle it.
This hypothetical other language would interop with morte purely for backwards compatibility. It would ideally support all the operations of morte independently, so you would only pay the O(n^2) cost when interacting with old morte code.
Ah, I see, but you mean that hypothetical language won't be a JSON for code like Morte is, just another language? I'm just wondering, why not have a JSON for code like Morte, but that can actually store all Haskell programs without performance penalties? I'll try to add self types to morte. Possibly above my skill ceiling, but it is worth a shot...
Actually, it would also be a JSON for code, just like morte. I'm using morte to flush out all the issues associated with distributing code securely and then once I understand those better I'll use the experience I gained with morte to inform "version 2" of the architecture.
Okay so let me see if I understand what you are saying, you're planning on making a 2.0 version of morte eventually, which will be very similar, but also able to encode O(n) min? How'll such language do so?
I don't know yet. All I know is what you've already told me about self types, so I'd study that more closely and try to implement it since it seems like it would be a decent solution to this problem.
Ah I see, makes sense :)
Could you detect the n^2 case when desugaring from Morte to something else?
I'm not sure, but there are lots of issues with forbidding zip anyway. For example, map h (zip (map f x) (map g y)) fuses to a single routing pass through both lists... which for me is one of the coolest things of normalizing terms like Morte do, but CC just can't have it.
So ideally you would compute at compile time the min function in the more efficient language so that you don't need to do the n^2 computation in morte. Really the only way this would strike is if you relied on morte to do the computation, which you don't have to do. There's nothing that even requires you to normalize morte expressions if you don't want to since you can pass them straight on through to the backend.
I have a feeling that you minimize the issue, though... you can't just reduce everything at compile time. Suppose you write a game in a functional language. That game has a feature of finding the least expensive in a list of items of a NPC. You can't just do it at compile time. If you store and transport that game's source as a Morte expression, that would mean that feature on your game is necessarily downgraded to quadratic.
You only pay the price if you normalize the expression while it is still a morte expression. If you translate back to a more efficient language and do the normalization there then you don't pay the quadratic price.
How not? Converting back to the efficient language won't change your algorithm to the linear corouting zipWith... you can't store the linear zipWith on Morte so you have to use the slower version to begin with...
I'm sorry if I'm looking stupid but I really can't visualize what you are thinking?
You can change the algorithm if you specifically detect the more inefficient min or zipWith and resugar them to more efficient versions.
However, the goal is that you would eventually never convert to morte except to interface with older code for legacy reasons.
One way to do it is to pass in both the type and min as arguments. You could have a Morte prelude, that has both a morte version and a C version. The morte version is for educationally versions, but the C is what is actually used.
I really feel through my extensive exploration of λ-calculus that there is a whole class of very important terms that can't be expressed without Scott-encoded values. There is only one efficient and elegant way to implement any function that somehow "compares" 2 structures (such as (==), zipWith, and so on), and that needs Scott. Without Scott, you can't really use the lambda calculus to its full potential. I think you underestimate how much expressive power you lose without it.
Functions such as tail operating on church-encoded values are an aberration. Those are not natural. They are huge, ugly, slow, don't fuse, create code explosion, make for terrible normal forms. Without fusion, any optimization opportunity that would come after the fusion is lost - it creates "islands of code that don't touch", beating the whole optimization-by-normalization ideal. When you want the tail of a Church-encoded strucutre, what you actually want is to convert it to a Scott-encoded structure and pattern match it. That leaves you with a Scott-encoded tail, but that is fine, because you never need the tail of a Church-encoded value. If you need tail, you are probably doing a steppy matching. Scott is for that, when you already have the recursive fuel. Church is for that recursive fuel. That is how you get beautiful normal forms. λ-calculus without Scott is crappy.
Also, I'm not sure I like the idea of optimizing those things as an extra step when compiling to the target language. For one, that beats the purpose of Morte, which is doing all the optimizations itself, by beta-reduction. Second, I really doubt you'd be able to detect those things. On the λ-calculus, you can fuse a giantic zip-filter-map-eq-zip-again-etc composition chain to a single tight loop on the λ-calculus. How'd you even approach detecting that on Morte? The normal form of such a chain would be huge without fusion. And you simply can't have zip fusion on pure CoC.
I agree it is, in a way, added complexity. But it is a such minor added complexity (what, 5 lines of code?) for such a ridiculous leap in expressive power that I can't see how it is not worth it. It is the missing piece of the puzzle. I can see so many applications for such a thing as Morte+Self, and it would be really fun to explore that language. But, as it is, there isn't really much I can do with Morte. I hope I'll eventually be able to convince you to reconsider the addition of self!
If it's only 5 lines of code, perhaps you could submit a PR and move the discussion there? For me it's not exactly clear what changes are proposed and how they will affect soundness and complexity of Morte.
I don't have the time/knowledge right now to do it myself, but I guessed it would be simple because the paper states so, as in, it adds just a single constructor to CoC's ADT afaik. If I'm underestimating the complexity let me know!
@MaiaVictor If I understated correctly, Morte's foreign function interface is strong enough that you could implement Scott types in a foreign library and then import them into morte with the modifications to morte what so ever. Basically you have the program take an argument of the form Scott : (Type -> Type) -> Type and an isomorphism from f (Scott f) to Scott f.
@ChristopherKing42 I don't visualize how that would work. Can we try with a concrete example?
-- ∀ a . ChurchNat -> a -> ChurchList a -> a
get_at index default list = (list cons nil scott_nat)
-- a -> ScottList a -> ChurchNat -> a
cons head tail nat = (nat Succ? Zero?)
Succ? pred = (tail pred)
Zero? = head
-- ChurchNat -> a
nil nat = default
-- ScottNat
scott_nat = (index Succ Zero)
Succ pred = (Succ Zero -> (Succ pred))
Zero = (Succ Zero -> Zero)
This function takes a church-encoded nat (index), a default value, and a church-encoded list, and returns the element at that index, or default if the index is out of bound. Since this function involves the inspection of two recursive datatypes (list and nat), I needed to convert one of them (the nat) to scott in order to avoid O(N^2) complexity.
Could you show me how would work on Morte?
I don't know much Morte, but I think I can give a basic outline. This shows how to make a Natural number with O(1) predecessor.
( \(Scott : (* -> *) -> *)
-> \(unwrap : forall (f : * -> *) -> Scott f -> f (Scott f)
-> \(wrap : forall (f : * -> *) -> f (Scott f) -> Scott f)
- > ( \(NatF : * -> *)
-> \(Succ : forall (a : *) -> a -> NatF a)
-> \(Zero : forall (a : *) -> NatF a)
-> \(foldNatF : forall (a : *) -> forall (r : *) -> (NatF a -> r) -> r -> r )
-> result -- `Scott NatF` now has efficient successor and predecessor function in the form of `wrap` and `unwrap`
)
-- NatF a
( \(a : *)
-> forall (r : *)
-> (a -> r) -- Succ
-> r -- Zero
-> r
)
-- Succ
( \(a : *)
-> \(va : a)
-> \(vas : forall (r : *) -> (a -> r) -> r -> r)
-> \(r: *)
-> \(Succ : a -> r)
-> \(Zero : r)
-> Succ va
)
-- Zero
( \(a : *)
-> \(vas : forall (r : *) -> (a -> r) -> r -> r)
-> \(r: *)
-> \(Succ : a -> r)
-> \(Zero : r)
-> Zero
)
-- foldNatF
( \(a : *)
-> \(vas : forall (r : *) -> (a -> r) -> r -> r)
-> vas
)
)
(My goodness is Morte hard to write by hand.)
The interpreter has to supply Scott, wrap, and unwrap. If the interpreter is written in Haskell, this amounts to plugging in Fix.
And even harder for me to read. Suppose you took that and wrote: pred (Succ Zero), would that compile to Zero?
@MaiaVictor I don't think so (at least Morte wouldn't). We didn't even require wrap and unwrap to be inverses.
So, the compiler doesn't actually understand the structure of your scott encoded values... that's hiding them from it. It isn't able to optimize anything that depends on it. You'll end up with 25% of your code being foreign function calls.
@ChristopherKing42 Just to let you know, I have a front-end for morte named annah which simplifies writing morte code a little bit:
https://github.com/Gabriel439/Haskell-Annah-Library
It's undocumented at the moment, but the easiest way to show how it works is to just give an example of me trying to implement CoNat using annah.
First, I implemented something analogous to Haskell's Unit type by creating the following annah files:
$ cat U/@.annah
type U
data Unit
in U
$ cat U/Unit.annah
type U
data Unit
in Unit
... and then compiling then using annah to generate the equivalent morte code (sans the .annah suffix):
$ annah < U/@.annah > U/@
$ annah < U/Unit.annah > U/Unit
I then created a Maybe folder, too:
$ cat Maybe/@.annah
\(a : *)
-> type Maybe
data Just a
data Nothing
in Maybe
$ cat Maybe/Just.annah
\(a : *)
-> type Maybe
data Just a
data Nothing
in Just
$ cat Maybe/Nothing.annah
\(a : *)
-> type Maybe
data Just a
data Nothing
in Nothing
... and then a NatF type:
$ cat NatF/@.annah
\(a : *)
-> type NatF
data Succ a
data Zero
in NatF
$ cat NatF/Succ.annah
\(a : *)
-> type NatF
data Succ a
data Zero
in Succ
$ cat NatF/Zero.annah
\(a : *)
-> type NatF
data Succ a
data Zero
in Zero
... and then I compiled all of those to morte. Once I had those I just defined the remaining code for CoNat in a single file:
$ cat test.annah
type CoNat
data Make (s : *) (seed : s) (step : s -> #NatF s)
fold foldCoNat
in
let Succ (n : CoNat) : CoNat =
foldCoNat
n
CoNat
(\(s : *) -> \(seed : s) -> \(step : s -> #NatF s) ->
Make
(#Maybe s)
(#Maybe/Nothing s)
(\(x : #Maybe s) ->
x
(#NatF (#Maybe s))
(\(y : s) ->
step y
(#NatF (#Maybe s))
(\(z : s) -> #NatF/Succ (#Maybe s) (#Maybe/Just s z))
(#NatF/Zero (#Maybe s) )
)
(#NatF/Succ (#Maybe s) (#Maybe/Just s seed))
)
)
let Zero : CoNat = Make #U #U/Unit (\(_ : #U ) -> #NatF/Zero #U )
let pred (n : CoNat) : #NatF CoNat =
foldCoNat
n
(#NatF CoNat)
(\(s : *) -> \(seed : s) -> \(step : s -> #NatF s) ->
step seed
(#NatF CoNat)
(\(x : s) -> #NatF/Succ CoNat (Make s x step))
(#NatF/Zero CoNat)
)
in {{result}}
Then I could begin to answer @MaiaVictor 's question of whether or not the coinductive encoding automatically detects the correct equalities. The answer I got is "sort of".
For example, it does correctly detect that pred Zero is identical to #NatF/Zero CoNat. They both normalize to the exact same expression.
However, it does not correctly detect that pred (Succ Zero) is identical to #NatF/Succ CoNat Zero. BUT it does correclty detect that it is #NatF/Succ CoNat {something equivalent to Zero}. So the part where equivalence is lost is in the existentially quantified contents of the Make constructor.
So the answer is that morte does detect observational equivalence of coinductively defined data types, meaning that morte can't prove that two types are the same, but morte can prove that each layer that you "pop" out from the type is same. Or in more formal terms morte can prove equivalence via bisimulation.
Still thinking a lot about this issue, the fact that the alternating zipWith implementation is the only way to get complete fusion on List functions so naturally quite makes it feel incomplete without Self :/ here is a zipWith implementation on the syntax I proposed, with x @ term for self, which seemingly typechecks (the one I posted on SO doesn't):
u: *
f: -u -u u
a: (List u)
b: (List u)
List: *
cons: -u -List List
nil: List
(T:*
(a (-T List) (h:u t:(-T List) k:T (k t h)) (k:T nil)
(b T (x:u t:T k:(-T List) y:u (cons (f x y) (k t))) (k:(-T List) x:u nil)))
r@(-(-r List) (-u List)))
One thing to note is that church encoding does not play well with laziness. In contrary, Scott encoding does not hamper lazy evaluation in any way.