lambda-8cc icon indicating copy to clipboard operation
lambda-8cc copied to clipboard

lists are not Scott encoded

Open tromp opened this issue 5 months ago • 2 comments

The encoding of lists in BLC is not a Scott encoding. Assuming that our list data type is data List a = Cons a (List a) | Nil then a Scott encoding would have the same nil = \c\n. n that BLC uses (i.e. False), but its cons would be cons = \a\b. \c\n. c a b which is clearly different. The cons that BLC uses actually corresponds to the stream data type data Stream a = Cons a (Stream a) which corresponds to an infinite list. But by some trick, BLC still makes it work for nil terminated lists as well. The trick is that the cons destructor has to accept an extra argument that is the nil deconstructor.

tromp avatar Jan 09 '24 21:01 tromp

Thanks for your comment. Please let me confirm my understanding:

In BLC's "list", utilities are defined as:

nil = \a. \b. b
cons = \a. \b. \f. f a b
car = \a. \b. a
cdr = \a. \b. b

We use the following pattern to differentiate between nil and cons, obtaining result1 and result2 for each case, respectively (c.f. YCombinator post):

list (\head \tail \_. result2) result1

Here, result2 can include head and tail.

On the other hand, for Scott Encoding, utilities are:

nil = \a \b. b
cons = \a \b \c \n. c a b

List elements are processed as:

list (\head \tail. result2) result1

Similar to BLC's pattern, result2 may use head and tail. This expression works for both nil and cons cases.

My understanding is that the cons deconstructor in BLC's list you mentioned (\head. \tail. \_. result2) deconstructs a cons into its head and tail. The extra argument \_ is ignored in the expression, consuming result1 only when list is a cons. Since result1 is used in the nil case, this seems to align with what you described as the nil deconstructor.

Regarding Scott Encoding, it provides a direct transformation of recursive data types into lambda calculus terms. Functional programming's typical lists and streams correspond to data List a = Cons a (List a) | Nil and data Stream a = Cons a (Stream a). Applying Scott Encoding, the former doesn't derive BLC's "list" type, but the latter does. Therefore, BLC's lists are not "Scott-encoded lists" but "Scott-encoded streams".

The above is my current understanding.

Now, I checked the repo and the Readme mentions the list data types:

lists are encoded in the Scott encoding with ${\rm cons} = \lambda x.\lambda y.\lambda f.(f x y)$, ${\rm nil} = \lambda x.\lambda y.y$.

I'm thinking of a good replacement for this description of BLC's implementation of lists. I first thought of explaining it as:

lists are encoded as streams in the Scott encoding with ${\rm cons} = \lambda x.\lambda y.\lambda f.(f x y)$, ${\rm nil} = \lambda x.\lambda y.y$, corresponding to the Scott encoding of data Stream a = Cons a (Stream a) in Haskell notation.

However, since we also use nil in place of Stream a, it's inaccurate to state that our data type is precisely equal to data Stream as shown here. I then finally understood what you've mentioned in this post, that this data type is a hybrid of lists and streams, when seen as being Scott-encoded.

Looking at the Wikipedia article for Church Encoding, it seems that the set of utilities used in BLC and lambda-8cc is actually the Church Encoding for lists. So it may be accurate and the simplest to state that the lists are Church-Encoded.

Regarding this, I propose the following description:

lists are encoded in the Church Encoding, with ${\rm cons} = \lambda x.\lambda y.\lambda f.(f \ x \ y)$, ${\rm nil} = \lambda x.\lambda y.y$. Lists are constructed similarly to Lisp as ${\rm list} = {\rm cons}\ A \ ({\rm cons}\ B \ ({\rm cons}\ C \ {\rm nil}))$, with its elements accessed as ${\rm list}~ (\lambda {\rm head}. \lambda {\rm tail}. \lambda {\rm x}. {\rm result1})~{\rm result2}$, where ${\rm result1}$ and ${\rm result2}$ are functions that handle the cases where the list is a cons cell or is nil.

What do you think?

woodrush avatar Jan 12 '24 12:01 woodrush

In BLC's "list", utilities are defined as: nil = \a. \b. b cons = \a. \b. \f. f a b car = \a. \b. a cdr = \a. \b. b

Disagree on the last two; those are what i call true and false. car and cdr, when applied to a list, return the head or tail, so car = \list. list true = cdr = \list. list false = (using <> to denote tuples as in my paper).

We use the following pattern to differentiate between nil and cons, obtaining result1 and result2 for each case, respectively (c.f. YCombinator post): list (\head \tail _. result2) result1 Here, result2 can include head and tail. On the other hand, for Scott Encoding, utilities are: nil = \a \b. b cons = \a \b \c \n. c a b

Correct. A more consistent naming scheme would set nil = \c \n. n (alpha equivalent to the above of course)

List elements are processed as: list (\head \tail. result2) result1> Similar to BLC's pattern, result2 may use head and tail. This expression works for both nil and cons cases.

My understanding is that the cons deconstructor in BLC's list you mentioned (\head. \tail. _. result2) deconstructs a cons into its head and tail. The extra argument _ is ignored in the expression, consuming result1 only when list is a cons. Since result1 is used in the nil case, this seems to align with what you described as the nil deconstructor.

I sometimes find a use for the result1 value within result as well, so then I don't use _ . For example see https://github.com/tromp/AIT/blob/master/fast_growing_and_conjectures/patcail.lam#L6-L7 Re-binding the t of the nil case make the de Bruijn index of t in the non-null case 2 bits shorter. The same line repeats that trick with s as well.

Regarding Scott Encoding, it provides a direct transformation of recursive data types into lambda calculus terms. Functional programming's typical lists and streams correspond to data List a = Cons a (List a) | Nil and data Stream a = Cons a (Stream a). Applying Scott Encoding, the former doesn't derive BLC's "list" type, but the latter does. Therefore, BLC's lists are not "Scott-encoded lists" but "Scott-encoded streams".

Except that Scott-encoded streams do not have a Nil constructor and no way of terminating a stream. That's why BLC lists are a hybrid of Scott encoded Lists and Streams. It uses the cons of Streams with the Nil from Lists. The result is not the Scott encoding of any single data type.

Now, I checked the repo and the Readme mentions the list data types:

which repo? README files are normally in all caps.

However, since we also use nil in place of Stream a, it's inaccurate to state that our data type is precisely equal to data Stream as shown here. I then finally understood what you've mentioned in this post, that this data type is a hybrid of lists and streams, when seen as being Scott-encoded.

Exactly.

Looking at the Wikipedia article for Church Encoding, it seems that the set of utilities used in BLC and lambda-8cc is actually the Church Encoding for lists. So it may be accurate and the simplest to state that the lists are Church-Encoded.

No, the lists are not Church encoded. See https://en.wikipedia.org/wiki/Church_encoding which actually shows the Church list encoding after mine under the heading "Represent the list using right fold".

-John

tromp avatar Jan 13 '24 20:01 tromp