Gauche-lisp15
Gauche-lisp15 copied to clipboard
LISP1.5 implemened on top of Gauche
// -- coding: utf-8 -- = LISP1.5 implementation on Gauche :sectnums: :toc: :toc-placement!: ifdef::env-github[] :tip-caption: :bulb: :note-caption: :information_source: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::[]
This is a fun project to implement LISP 1.5 as described in "LISP 1.5 Programmer's Manual". It's not a complete LISP 1.5 environment emulator; instead, it is to get a glimpse of how it is constructed.
toc::[]
== M-expressions
When LISP was in its infancy, it wasn't wrapped with so many parentheses which would later become its most memorable characteristic. At least in front of people's eyes.
An early LISP program looked like this:
member[a;l] = [null[l] → NIL; eq[a;car[l]] → l; T → member[a;cdr[l]]]
S-expressions were also used, but only to denote data.
member[C;(A B C D E)]
LISP programmers back then read and wrote programs in this form, on paper. Yes, program sources were meant to be read by humans. In order for computers to understand programs, one must have prepared a deck of cards with holes punched according to link:https://en.wikipedia.org/wiki/BCD_(character_encoding)[BCD character code]. The character set didn't have many punctuations and it was impossible to punch M-expressions directly. Instead, people translated M-expressions into S-expressions, which was already used to describe nested list structures. In other words, people fed the machine the internal representation of syntax tree directly.
Eventually, people got used to write programs in S-expresions directly, and M-expressions were faded into oblivion.
However, when we read papers of that era, we see M-expressions everywhere. Now that we can write M-expressions in a text file, let's make the computer understand it directly!
=== M-expression syntax
Here's a summary of M-expression sytnax:
- Identifiers are in all lowercase. Literal symbols are in all uppercase.
No
quotein M-expression. Literal list can be written using parentheses, e.g.(A B (C D . E)), without a quote. - Function call is written as
fn[arg1;arg2;...] - Conditional is
[test1 \-> expr1; test2 \-> expr2;...]. It works just likecond. Unicode arrow character->(U+2192) can be used in place of\->. - Symbol
NILis used for list terminator. Hencecons[A;NIL]yields(A). - Literal function is written as
lambda[[arg;..];expr]. This lambda form itself doesn't have a value -- it must be called with arguments to take effect. - Local recursive function can be written as
label[name;lambda[[arg;...];expr]], where you can usenamein theexprto refer to itself. - In toplevel, you can define a function by
name[arg;...] = expr.
An M-expression can be translated into an S-expression as follows:
- Literal data (symbols and lists) are embedded in the output
by
(QUOTE <literal>). - Function call
fn[arg1;arg2;...]becomes (FN ARG1 ARG2 ...)`. We didn't have lower case characters on computers back then. - Conditional
[test1 \-> expr1; test2 \-> expr2;...]becomes(COND (TEST1 EXPR1) (TEST2 EXPR2) ...). - Literal function
lambda[[arg;..];expr]becomes(LAMBDA (ARG ...) EXPR). - Local recursive function
label[name;lambda[[arg;...];expr]]becomes(LABEL NAME (LAMBDA (ARG ...) EXPR)).
=== Parsing a single M-expression
Module LISP1.5.mexpr implements parsers. A procedure
parse-mexpr takes a string or an input port, and parses one M-expression
and returns an S-expression.
gosh> ,u LISP1.5.mexpr gosh> (parse-mexpr "cons[(A . B);C]") (CONS (QUOTE (A . B)) (QUOTE C)) gosh> (parse-mexpr "lambda[[x];[eq[NIL;x]→T; T→F]]") (LAMBDA (X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
See how liteals (upper-case names and lists) are QUOTE{nbsp}-d
once parsed.
[TIP]
If you try the above example on your REPL, make sure your terminal character encoding matches your Gauche setting (it's utf-8 by default).
Interestingly, there seemed no direct translation of definition
syntax name[arg;...] = expr into an S-expression.
When a user translates a prorgam in M-expressions into S-expressions
to punch the cards,
she gathers all the definitions into a call of DEFINE pseudo function
(see the note below).
For our purpose, we want the parser to yield a single S-expression
from one M-expression. So, we return ($= ...) form
as the result of parsing a defintion:
gosh> (parse-mexpr "null[x] = [eq[NIL;x]→T; T→F]]") ($= (NULL X) (COND ((EQ (QUOTE NIL) X) (QUOTE T)) ((QUOTE T) (QUOTE F))))
(The preceding $ of $= indicates it is an internal or
implementation-dependent feature.)
The parse-mexpr function only parses one M-expression.
If the input may contain more than one M-expression, use parse-mexprs
instead, which reads input up to EOF and returns an lseq of result
S-expressions.
=== Comments and other exhancements
We also add these extensions, for the convenience.
==== Comments
There's no comment syntax defined in M-expressions. Since
it's for humans to read, you could freely intermix code
and natural language descriptions. For our purpose,
we make a hash sign # to the end of line is a comment.
(We avoid ;, for it is used as a separator.)
==== Extended COND form
In Appendix B of LISP 1.5 Programmer's Manual, they use a pseudo extension of conditional expression for concise explanation, in which you can access the result of test expression from the expression in that branch. Such extension wasn't formalized and the actual code is written in assembly language instead of M-expressions. But for our purpose it'll be convnient to support such extension.
It is to allow a conditional expression to have the following clause:
test => fun
Here, fun must be a LAMBDA form that takes one argument,
or an expression that yield a function.
First, test is evaluated, and if it yiels a true value
(a value neither NIL nor F), the value is passed
to the function. It's the same as Scheme's cond feature with \=>.
We'll explain the actual use case and implementation of this extension when we get to the full toplevel environment support.
=== Writing source in M-expression:
With Gauche's reader directive feature, you can write source in M-expressions, as follows:
;; ;; Scheme comments ;; (use LISP1.5.mexpr) #!m-expr
M-expression function definitions
function1[arg;...] = expression1 function2[arg;...] = expression2 ...
For our purpose, we want to treat M-expressions as our source code, and the parser returns a single S-expression as a result. So we introduce our own extension.
($TOPLEVEL
: ($= [])
|
When read, the entire source is wrapped in $TOPLEVEL form.
Inside it, each toplevel form becomes either
($= <expr> <expr>) (in case of definition) or just an <expr>
in case of toplevel function call. This $TOPLEVEL form is
merely our parser's way to wrap the result, and its interpretation
depends on the caller of the parser; it doesn't mean we'll have
a special form called $TOPLEVEL.
[NOTE]
In the actual use case, all definitions in a program were gathered and translated into the following form to be punched:
DEFINE (( (NAME (LAMBDA (ARG ...) EXPR)) (NAME (LAMBDA (ARG ...) EXPR)) ... ))
This is actially a special syntax to execute a function call on toplevel.
It takes a form FUNC (ARG ...), where ARG{nbsp}s are implicitly
quoted. The function DEFINE takes one argument, which is
a form of ((NAME LAMBDA-EXPR) ...).
If you want to perform some calculation, you list the call of
the function after the DEFINE form, as follows:
DEFINE (( ... definitions .. (DOSOMETHING (LABMDA (ARG ...) EXPR)) ))
DOSOMETHING (ARG ...)
Examples are shown in p.15 and pp.48-51 of LISP1.5 Programmer's Manual.
The #!m-expr directive translates those M-expressions into
a LISP1.5 DEFINE form:
($TOPLEVEL ($= (FUNCTION1 ARG ...) EXPRESSION1) ($= (FUNCTION2 ARG ...) EXPRESSION2) ...)
Note that you have to have definitions of $TOPLEVEL and other primitive
LISP1.5 forms before loading the source file; The LISP1.5.mexpr module
only handles parsing.
We provide several implementations of those LISP1.5 primitives, which we'll show you in the following chapters.
== A Universal LISP Function
=== Running EVAL with minimal axioms
Section 1.6 of "LISP 1.5 Programmer's Manual" is one of the pinnacles of the document. They show how to implement Lisp interpreter on top of Lisp systems. They call it "a Universal LISP function".
We write out their code in link:mx/eval.mx[].
What's interesting about it is that you only need a handful of
functions and syntaxes to run the interpreter. We define those
minimal set of primitives in link:LISP1/5/axioms.scm[].
It provides the definition of the following primitives:
CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND, as well as
a definition of $TOPLEVEL to handle toplevel forms.
To try the eval function, first use the axioms module, then
load the eval.mx file. Assuming you have
load path set to the top directory of LISP1.5 source,
you can say the following in the gosh REPL:
gosh> ,u LISP1.5.axioms gosh> ,l mx/eval.mx #t
Or, you can start gosh with loading necessary modules (this assumes you're in the top directory of LISP1.5 source):
$ gosh -I. -u LISP1.5.axioms -l mx/eval.mx
On the gosh prompt, you can call EVAL. The first argument
is the S-expression to evaluate, and the second argument
is the environment (assoc list of symbols and values):
gosh> (EVAL '(CONS (CAR (QUOTE (X . Y))) (QUOTE Z)) 'NIL) (X . Z)
Be aware of the difference of ' (quote) and QUOTE.
The former one is recognized by Gauche. The latter one is recognized by
EVAL.
If you prefer, you can write M-expressions using
read-time constructor #,(m-expr "..."):
gosh> (EVAL '#,(m-expr "cons[car[(X . Y)];Z]") 'NIL) (X . Z)
Following is a bit more convoluted example. It defines append
as a recursive funciton using LABEL, and calls it with
two arguments, (A B C) and (X Y Z):
gosh> (EVAL '#,(m-expr "label[append;lambda[[xs;r];
[eq[xs;NIL] -> r;
T -> cons[car[xs];append[cdr[xs];r]]]]]
[(A B C);(X Y Z)]")
'NIL)
(A B C X Y Z)
This interpreter only knows the minimal 7 primitives:
CAR, CDR, CONS, ATOM, EQ, QUOTE, and COND.
To refer to anything other than that,
you have to pass them in the environment argument.
The following example reverses a list, using the
definition of NULL, APPEND and REVERSE given to the environment:
gosh> (EVAL '#,(m-expr "reverse[(A B C D E F G)]")
'((NULL . #,(m-expr "lambda[[x];[eq[x;NIL] -> T; T -> F]]"))
(APPEND . #,(m-expr "lambda[[xs;r];
[eq[xs;NIL] -> r;
T -> cons[car[xs];append[cdr[xs];r]]]]"))
(REVERSE . #,(m-expr "lambda[[xs];
[null[xs] -> NIL;
T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))
))
(G F D C B A)
[NOTE]
We need to provide the function NULL in the environment,
since the one defined in eval.mx exists in the world of Gauche, and is
not visible from the world of EVAL.
[TIP]
When you refer to an identifier that's neither one of the built-in primitive nor the one given in the environment, you'll get an error like the following:
*** ERROR: pair required, but got NIL Stack Trace:
0 (car x) at "./LISP1/5/axioms.scm":9 1 (CAR X) [unknown location] 2 (CAAR A) [unknown location] 3 (EQUAL (CAAR A) X) [unknown location] 4 (ASSOC E A) [unknown location] 5 (EVAL FN A) [unknown location] ...
The code searches the environment alist by ASSOC, hits the end of
the alist without finding it and complains. Remember, we have minimal
interpreter and there's no fancy error handling mechanism.
=== Going Metacircular
Since the universal LISP function defined in eval.mx understands
the primitives required to interpret functions in eval.mx, you can use
our EVAL to evaluate eval.mx to run EVAL on top of
EVAL -- now you're running a metacircular interpreter!
You might have noticed though, that axioms.scm provides $TOPLEVELS,
which is missing in eval.mx. In our context of discussing
metacircular interpreter, $TOPLEVELS appears as a result of
parsing M-expression definitions, and should be understood
as a meta-language to direct the set-up, rather than an integrated
part of the language (one way to think of it is that if other primitives
are C built-ins then $TOPLEVELS is #pragma or Makefile -- they belong
to a different layer.)
Of course, it is more convenient to have an ability in the core language to add new toplevel definitions, and we'll deal with it later. For now, let's stick to the 7 primitives.
In order to run EVAL inside EVAL, we need to prepare the definitions
in eval.mx as an environment alist passed to outer EVAL.
Run the following command in the toplevel source directory:
$ gosh tools/mexpr-env.scm mx/eval.mx
It reads eval.mx and prints the definitions in an alist. Copy the output,
then start gosh again, read axioms and load eval.mx, and evaluate
the EVAL expression, passing the copied alist as the environment
(don't forget the quote before the alist!):
gosh> ,u LISP1.5.axioms gosh> ,l mx/eval.mx #t gosh> (EVAL '(EVAL (QUOTE (CAR (QUOTE (X . Y)))) (QUOTE NIL)) '...<<here, copy & paste the output of mexpr-env.scm>>) X
The result X is the result of (CAR (QUOTE (X . Y))), computed
by the EVAL function implemented in LISP1.5, not the underlying Gauche.
If cut&pasting the environment alist is too tedious, mexpr-env.scm can
create a definition of an auxiliary function EVAL*, which calls EVAL
with the environment that has all the definitions in the given source file.
Run mexpr-env.scm with -e option, and save the result in lisp/eval.lisp:
$ gosh tools/mexpr-env.scm -e mx/eval.mx > lisp/eval.lisp
[TIP]
Instead of manually executing tools/mexpr-env.scm, you can
run the standard build process (./configur && make) and
all the converted files are placed under lisp/.
We use suffix lisp to indicate it is not a Scheme code (even though
Gauche can understand it after using LISP1.5.axioms).
The created lisp/eval.lisp looks as follows:
($TOPLEVELS ($= (EVAL* X) (EVAL X '...<>...
)))
That is, it defines EVAL* which takes one LISP1.5 expression and
evaluates it under the enviornment where all the definitions in eval.mx
is visible.
The created eval.lisp can be loaded to gosh after using LISP1.5.axioms.
Together with mx/eval.mx, you can run EVAL on top of EVAL:
$ gosh -I. -uLISP1.5.axioms -lmx/eval.mx -leval-star.lisp gosh> (EVAL* '#,(m-expr"eval[(CONS (QUOTE X) (QUOTE Y));NIL]")) (X . Y)
This time we used M-expression in the inner call. It's the same
as writing '(EVAL (QUOTE (CONS (QUOTE X) (QUOTE Y))) (QUOTE NIL)).
Let's recap what's happening. The outer EVAL (via EVAL*) is
executed by Gauche, using the initially loaded eval.mx. The
inner EVAL is interpreted by the outer EVAL, using the
enviornment created by mexpr-env.scm.
And the expression (CONS (QUOTE X) (QUOTE Y)) is interpreted by
the inner EVAL:
+----------------------------+
| (CONS (QUOTE X) (QUOTE Y)) |
+----------------------------+
| EVAL | ; inner EVAL
+----------------------------+
| EVAL | ; outer EVAL
+----------------------------+
| Gauche |
+----------------------------+
If it is not obvious, try it with an altered environment.
For example, edit the eval.lisp created above
to change the inner EVAL recognizes KWOTE instead of QUOTE.
There's only one place to change:
(EVAL LAMBDA (E A) (COND ((ATOM E) (CDR (ASSOC E A))) ((ATOM (CAR E)) (COND ((EQ (CAR E) (QUOTE KWOTE)) (CADR E)) ^^^^^ ((EQ (CAR E) (QUOTE COND)) (EVCON (CDR E) A)) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A)))) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))
(Leave other QUOTE intact, for they are recognized by the outer EVAL).
Now, try it:
(EVAL* '(EVAL (QUOTE (CONS (KWOTE X) (KWOTE Y))) (QUOTE NIL))) => (X . Y)
The two QUOTE{nbsp}s are recognized by the outer EVAL, and the two
KWOTE{nbsp}s are recognized by the inner EVAL. Furthermore,
the ' (quote) is recognized by Gauche.
=== Having FUN with ARG
(If you know what we'll talk about from the section title, you can skip this section. Yes, it's just about that.)
One advantage of having a simple language with a concise interpreter is that we can tweak it easily.
In the universal EVAL, a function is represented as a literal list
whose car is LAMBDA. It is a powerful idea--now you can have
a function as a first-class citizen of the language, that you can
construct it, pass it to another function, and return it from another
funciton. However, it has a flaw.
Let's try a failure case and see if we can fix it.
Consider MAPCAR function, which takes a function and a list, and
returns a list of results of the function applied to each element of the
given list (that is, Scheme's map function):
mapcar[fn;x] = [null[x] -> NIL; T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
It is in link:mx/mapcar.mx[]. You can't load it directly
into Gauche, however. Treating a list starting with LAMBDA as
a function is a feature of EVAL, not Gauche.
We have to make EVAL understand the above definition.
We can use the same technique we used in the metacircular interpreter --
that is, translate the definition of MAPCAR above into an enviroment
alist. We also need the definition of NULL, so let's combine
eval.mx together with mapcar.mx. It can be done with the following
command line:
$ gosh tools/mexpr-env.scm -e mx/eval.mx mx/mapcar.mx > lisp/mapcar.lisp
Alternatively, run ./configure then make in the toplevel source directory.
Once you have lisp/mapcar.lisp, you can load it (after mx/eval.mx)
and you can call MAPCAR inside EVAL*:
$ gosh -I. -uLISP1.5.axioms gosh> ,l mx/eval.mx #t gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (CONS X (QUOTE Y)))) (QUOTE (A B C)))) ((A . Y) (B . Y) (C . Y)) gosh> (EVAL* '#,(m-expr "mapcar[(LAMBDA (X) (CONS X (QUOTE Y)));(A B C)]")) ((A . Y) (B . Y) (C . Y))
So far, so good.
Now, Let's try nesting MAPCAR. We'll do equivalent to the following
Scheme code:
(map (lambda (x) (map (lambda (y) (cons x y)) '(p q r))) '(a b c)) => (((a . p) (a . q) (a . r)) ((b . p) (b . q) (b . r)) ((c . p) (c . q) (c . r)))
Here's LISP1.5 version:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) => ((((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)) (((P Q R) . P) ((Q R) . Q) ((R) . R)))
Oops, what happened? Let's examine the details.
Outer MAPCAR receives two actual parameters, (LAMBDA (X) ...) and (A B C)
(QUOTE{nbsp}s are stripped when arguments are evaluated
by evlis before calling the function). They are bound to the
local parameters, FN and X, respectively. In other words,
the body of MAPCAR:
[null[x] -> NIL; T -> cons[fn[car[x]];mapcar[fn;cdr[x]]]]
is evaluated with the following environment:
((FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Since X is not NIL, evaluation goes to cons[...] branch.
The first argument is fn[car[x]], so first car[x] is evaluated
and yields A, fn evaluated to the outer LAMBDA form
and we call it with A. The body of inner LAMBDA form, which
is the inner MAPCAR call, is evaluated with the following environment
(Keep in mind that the new local bindings are inserted in front of
outer environment):
((X . A) (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Inner MAPCAR gets (LAMBDA (Y) (CONS X Y)) and (P Q R) as two
actual parameters, which are bound to MAPCAR{nbsp}'s formal paramter
FN and X again, and the environment under which innter MAPCAR{nbsp}'s
body is evaluated looks like this:
((FN . (LAMBDA (Y) (CONS X Y))) (X . (P Q R)) (X . A) (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C)))
Finally, innter LAMBDA is called -- first, P as the
actual parameter, which is bound to Y. Hence the body
of the inner LAMBDA, which is (CONS X Y), is evaluated
under the following environment:
((Y . P) (FN . (LAMBDA (Y) (CONS X Y))) (X . (P Q R)) <1> (X . A) <2> (FN . (LAMBDA (X) (MAPCAR (QUOTE (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (X . (A B C))) <3>
- Argument for the inner
MAPCAR - Argument for the outer
LAMBDA - Argument for the outer
MAPCAR
Now it is clear why it didn't work. When we write the
initial nested MAPCAR form, we expect that X in the
innermost expression (CONS X Y) refer to the formal parameter of the
outer LAMBDA. But it is shadowed by the formal parameter of the
MAPCAR.
This is a well-known problem, and in lambda calculus it is avoided
by renaming the parameter names to avoid conflict. In our case,
if we rename the formal parameter of inner LAMBDA to something
different from the formal parameter of MAPCAR, it works as expected:
(EVAL* '(MAPCAR (QUOTE (LAMBDA (Z) <1> (MAPCAR (QUOTE (LAMBDA (Y) (CONS Z Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) => (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
- We use
Zto avoid confclit withMAPCAR{nbsp}'sX.
However, we can't possibly avoid all potential conflict manually, and renaming all formal parameters programatically to unique ones can be costly.
LISP1.5 employed another way to solve this problem. Instead of passing
LAMBDA form quoted, it introduced another form, called FUNCTION.
The rule is that whenever you pass a function as an argument,
you wrap it with FUNCTION instead of QUOTE. With this rule,
our call of nested MAPCAR would look like this:
(EVAL* '(MAPCAR (FUNCTION (LAMBDA (X) (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C))))
Now we modify our universal LISP function to deal with FUNCTION.
We only need to change two lines. First, make EVAL understand
(FUNCTION <fn>) form. Whenver it sees the form, it just
returns a list (FUNARG <fn> <env>), where <env> is the evaluation
enviornment:
eval[e;a] = [atom[e] -> cdr[assoc[e;a]]; atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e]; eq[car[e];FUNCTION] -> cons[FUNARG;cons[cadr[e];cons[a;NIL]]]; <1> eq[car[e];COND] -> evcon[cdr[e];a]; T -> apply[car[e];evlis[cdr[e];a];a]]; T -> apply[car[e];evlis[cdr[e];a];a]]
- If we see
(FUNCTION <fn>)form, wrap the function and the current environment inFUNARGform, as(FUNARG <fn> <env>).
Then, in APPLY, we call <fn> with the rememberd <env> instead of
the passed environment:
apply[fn;x;a] = [atom[fn] -> [eq[fn;CAR] -> caar[x]; eq[fn;CDR] -> cdar[x]; eq[fn;CONS] -> cons[car[x];cadr[x]]; eq[fn;ATOM] -> atom[car[x]]; eq[fn;EQ] -> eq[car[x];cadr[x]]; T -> apply[eval[fn;a];x;a]]; eq[car[fn];FUNARG] -> apply[cadr[fn];x;caddr[fn]]; <1> eq[car[fn];LAMBDA] -> eval[caddr[fn];pairlis[cadr[fn];x;a]]; eq[car[fn];LABEL] -> apply[caddr[fn];x;cons[cons[cadr[fn];caddr[fn]];a]]]
- Apply the wrapped function in the rememberd environment
The changed definitions are in link:mx/funarg.mx[]. You can load it and see it addresses the issue (which has been called FUNARG problem).
$ gosh -I. -u LISP1.5.axioms -l mx/funarg.mx gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (FUNCTION (LAMBDA (X) (MAPCAR (FUNCTION (LAMBDA (Y) (CONS X Y))) (QUOTE (P Q R))))) (QUOTE (A B C)))) (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
[NOTE]
Did you notice that you actually did't need FUNCTION? Instead
of introducing another form, you can let EVAL create FUNARG
when it sees a bare LAMBDA form. The definition will look like this:
eval[e;a] = [atom[e] -> cdr[assoc[e;a]]; atom[car[e]] -> [eq[car[e];QUOTE] -> cadr[e]; eq[car[e];LAMBDA] -> cons[FUNARG;cons[e;cons[a;NIL]]]; eq[car[e];COND] -> evcon[cdr[e];a]; T -> apply[car[e];evlis[cdr[e];a];a]]; T -> apply[car[e];evlis[cdr[e];a];a]]
The updated definition is in link:mx/funarg-lambda.mx[]. Using it,
calling MAPCAR becomes quite simpler:
$ gosh -I. -u LISP1.5.axioms -l mx/funarg-lambda.mx gosh> ,l lisp/mapcar.lisp #t gosh> (EVAL* '(MAPCAR (LAMBDA (X) (MAPCAR (LAMBDA (Y) (CONS X Y)) (QUOTE (P Q R)))) (QUOTE (A B C)))) (((A . P) (A . Q) (A . R)) ((B . P) (B . Q) (B . R)) ((C . P) (C . Q) (C . R)))
This idea was realized by Sussman and Steele in 1975, as a dialect Scheme. The first paper of Scheme stated it at the beginning:
[quote, Gerald Jay Sussman and Guy Lewis Steele Jr., 'SCHEME: An Interpreter For Extended Lambda Calculus']
SCHEME is essentially a full-funarg LISP. LAMBDA expressions need not be QUOTEd, FUNCTIONed, or *FUNCTIONed when passed as arguments or returned as values; they will evaluate to closures themselves.
==========================================================
=== Symbols and toplevel environment
So far, our EVAL requires any bindings to be provided
via the environment argument. Preprocessing the source with mexpr-env.scm
was a remedy, but it's still troublesome. So our next step is to
add a toplevel environment, that keeps global bindings of symbols.
The easiest way is to keep a global table, and when we search
a variable binding via ASSOC (in the first branch of EVAL),
we also look up the table when we didn't find any local bindings.
However, LISP1.5 took a bit different approach. Since its symbol had a property list, or plist, which could hold arbitrary key-value pairs, so I suspect it was natural to store the global value of the symbol in its plist. In fact, even the name of a symbol was merely one of its properties. In LISP1.5, a symbol was just another type of list where the car of its head was marked with a special value (-1).
[NOTE]
A property list (plist) associates keys to values, much like
an associative list (alist),
but its structure alternates keys and values. For example, if
key A has value APPLE and key B has a value BANANA, it can
be represented with the following alist and plist, respectively:
;; alist ((A . APPLE) (B . BANANA))
;; plist (A APPLE B BANANA)
The number of cons cells used are the same. We're not sure why LISP1.5
creators used plist for symbol properties, while they used
alist for environment in EVAL.
In our minimal infrastructure (link:LISP1/5/axioms.scm[]) we just used Gauche symbols for LISP symbols. It might be interesting, though, to reproduce what LISP1.5 did -- using a list to implement symbols!
That is, from now on, our LISP symbol is a pair whose car is
a special marker. We use Gauche symbol ATOM. From LISP world,
a LISP symbol is an unbreakable unit (hence it is called atom), so
the marker is never be visible. Under the hood, in Gauche level,
we can break an atom to access its internal structure. It is as
if LISP world deals with chemical reactions and Gauche world deals
with nuclear reactions.
In LISP symbols, its name is stored as a value of the property
PNAME. Since the property list is scanned by LISP function,
we have to use LISP symbols as the property key. For the name itself,
we use a Scheme string; in real LISP1.5, the name is stored
in a special way and treated specially (there wasn't a string type).
Thus, LISP symbol PNAME has the following structure in Gauche:
[source, scheme]
(define PNAME '#0=(ATOM #0# "PNAME"))
The #0= notation is a Scheme way to write a circular structure.
The symbol PNAME has a propoerty list, in which the key PNAME
is associated to the name "PNAME". Note that they LISP symbol
PNAME itself doesn't have a global value.
The global value of symbols is stored as a propery value with
the key APVAL. So we need the LISP symbol APVAL, which looks
like the following in Gauche. APVAL itself doesn't have a global
value either:
[source, scheme]
(define APVAL `(ATOM ,PNAME "APVAL"))
Once we have PNAME and APVAL, we can define NIL, whose name
is "NIL" and value is itself. We can't use #0= notation this time,
since we have to construct the list using values of \*PNAME\* etc.
[source, scheme]
(define NIL (rlet1 nil (list 'ATOM PNAME "NIL" APVAL) (set! (cddddr nil) (list nil))))
Here's how \*NIL* looks like in Gauche world.
#1=(ATOM #1# "PNAME") is LISP symbol PNAME, and
(ATOM #1# "APVAL") is LISP symbol APVAL. Remember we're looking
at the internal of atoms -- from LISP world, this is just a symbol
NIL.
gosh> NIL #0=(ATOM #1=(ATOM #1# "PNAME") "NIL" (ATOM #1# "APVAL") #0#)
We can define several symbols in this way. See link:LISP1/5/runtime.scm[] for all the predefined symbols.
Let's start building infrastructure. Our LISP world only have symbols
and cons cells so far (we'll add numbers later). We can define $atom?
and $cons? as follows (The $ indicates it deals with LISP objects):
[source, scheme]
(define ($atom? obj) (and (pair? obj) (eq? (car obj) 'ATOM))) (define ($cons? obj) (and (pair? obj) (not (eq? (car obj) 'ATOM))))
Then we can define $lisp\->scheme, which converts LISP data structure
into Scheme data structure, handy for debugging.
We map NIL inside the structure into Scheme empty list, so that
list structure can be printed naturally (instead of having . NIL)
at the end.) We also convert non-LISP object into a string #[...].
[source, scheme]
(define ($lisp->scheme obj) (define (rec obj) (cond [(eq? obj NIL) '()] [($atom? obj) (string->symbol (cadr (member PNAME (cdr obj))))] [(pair? obj) (cons (rec (car obj)) (rec (cdr obj)))] [else (format "#[~s]" obj)])) (if (eq? obj NIL) 'NIL (rec obj)))
It's also handy to have $scheme\->lisp, which converts Scheme
structure into LISP structure. One important point: We want to keep
symbol's eq{nbsp}-ness, that is, LISP symbols with the same name
can be compared with eq. So we keep a hashtable to map Scheme
symbol to LISP symbols.
[source, scheme]
(define obtable (hash-table-r7 eq-comparator 'NIL NIL 'PNAME PNAME 'APVAL APVAL))
(define ($scheme->lisp obj) (cond [(null? obj) NIL] [(symbol? obj) (or (hash-table-get obtable obj #f) (rlet1 s (list 'ATOM PNAME (symbol->string obj)) (hash-table-put! obtable obj s)))] [(pair? obj) (cons ($scheme->lisp (car obj)) ($scheme->lisp (cdr obj)))] [else (errorf "Cannot convert ~s to LISP" obj)]))
Let's try them. Converting Scheme (A B C D E) into LISP results
somewhat scary structure, but converting it back shows it's nothing
to be afraid of:
gosh> ($scheme->lisp '(A B C D E)) ((ATOM #0=(ATOM #0# "PNAME") "A") (ATOM #0# "B") (ATOM #0# "C") (ATOM #0# "D") (ATOM #0# "E") . #1=(ATOM #0# "NIL" (ATOM #0# "APVAL") #1#)) gosh> ($lisp->scheme *1) (A B C D E)
Not all global values are stored in APVAL property. LISP1.5 uses
several different keys, depending on the type of the value. APVAL
is used when a symbol is used as a variable, and other keys are
used when a symbol is used in the function position of the function call.
[%header,cols=2*] |=== | Key | Value
|APVAL
|The value is a LISP object.
|EXPR
|The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments
are evaluated before passed to it.
|FEXPR
|The value is a LISP-defined function (LAMBDA or FUNARG form). The arguments
are not evaluated, and passed as a single list.
|SUBR
|The value is a native function (written in assembly in the acutal LISP1.5,
written in Gauche in our case). The arguments are evaluated before
passed it.
|FSUBR
|The value is a native function (written in assembly in the acutal LISP1.5,
written in Gauche in our case). The arguments
are not evaluated, and passed as a single list.
|===
It is worth to mention that EXPR form receives fixed-number of arguments. If you want to write a function in LISP that takes variable number of arguments, you have to make it FEXPR, and evaluate the given list of arguments by yourself.
[NOTE]
Lisp dialects can be categorized to either Lisp-1 or Lisp-2. They are not versions, but about namespaces.
Lisp-1 unifies function and variable namespaces, so in the function call syntax, the function name is looked up the same way as variable look-up. Scheme is Lisp-1.
Lisp-2 have separate namespaces for functions and variables.
You can use the argument named list, and it is treated separately
from the function list. When you need to call a function stored
in a variable, you need to use an extra function, funcall.
Common Lisp is Lisp-2.
This design of having different keys for function call and
variable makes LISP1.5 a Lisp-2. However, interestingly,
to call a function stored in a variable you can place the variable
in the function position, without funcall, just like Scheme.
So, coincidentally, we can say LISP1.5 is somewhat between Lisp-1 and Lisp-2.
=== Enhancement of M-expressions
The EVAL procedure that uses symbol's property lists are
shown in Appending B of "LISP1.5 Programmer's Manual". However,
it contains some pseudo code which were actually implemented
in the assembly. Although we can rewrite the code in pure
LISP1.5, it would be pretty verbose; instead, we enhance our M-expressions
a bit so that the pseudo code in Appendix B can be written naturally
in our implementation.
Specifically, we allow this clause in the cond form:
test => proc
The test expression is evaluated, and if it yields neither NIL nor F,
the procedure proc is called with the result of test as the sole
argument. It is the same as Scheme's cond. You can also
use Unicode character ⇒ (U+21d2) in place of =>.
We also allow λ (U+03bb) in place of lambda, for conciseness
and similarity to the listing in "LISP1.5 Programmer's Manual".
Here's a Scheme's filter-map written in our M-expression:
filtermap[pred;lis] = [null[lis] → NIL; pred[car[lis]] ⇒ λ[[x];cons[x;filtermap[pred;cdr[lis]]]]; T → filtermap[pred;cdr[lis]]]
It works as follows:
filtermap[atom;(A (B) C (D) E)] ⇒ (A C E)
=== Bridging worlds
As we did in our first version with link:LISP1/5/axioms.scm[axioms.scm] and link:mx/eval.mx[eval.mx], we want to keep Scheme code minimal and write the rest of the system in LISP itself. We also want to write so-called standard libraries in LISP, too.
When you write language X in the language X itself, you have to be epecially careful which world you're dealing with. Before proceeding, let's recap the layered structure we saw in the previous sections.
-
In
axioms.scm, we defined minimal operators in Scheme to run LISP 1.5. It is the bottom world, or the Basement. We can see all the mechanics that runs the LISP system from the Basenment. -
Then we loaded
eval.mx, which is written in LISP 1.5 itself. At this time though, the functions ineval.mx, such asNULL,ASSOCorEVAL, are actually Gauche variables, bound to Gauche procedures; TheDEFINEmacro inaxioms.scmtranslates LISP 1.5 definitions into Gauche definitons. The functions ineval.mxdoesn't know about Gauche, even though they themselves are running as Gauche procedures. We're in the Ground Floor. -
Then we processed
eval.mxwithmexpr-env.scmto produceeval.lisp. It hasEVAL*, which is still Ground Floor function. It takes a LISP1.5 expression and evaluates it. The expression passed toEVAL*lives in the First Floor, above the Ground Floor. As we've seen, the habitants in the First Floor knows nothing about the Ground Floor or the Basement, except the bindings passed as the environment.
Now, in our revised runtime, difference between the Basement and the Ground Floor becomes wider: A LISP symbol is an unbreakable atom in the Ground Floor, but it's just a pair in the Basement.
We do need a few conduits between the floors, so that the upper floor can access the functionality of lower floor:
-
SUBRandFSUBRare functions implemented in the basement. In the original LISP1.5, they were implemented in assembly language. In our case, they are written in Gauche. In order to invoke those functions from the ground floor, we need an additional primitive,CALLSUBR, which takes the instance ofSUBRorFSUBRand arguments, to call it. -
Atoms in LISP world now has structure---somewhat like that an atom in original sense was the minimal unbreakable building block of the universe, then mankind found it has electrons and nucleus in it. We do want to treat LISP atoms as unbreakable entity for most of the time, except when we want to access its property list.
-
The previous version of
EVALdoesn't have error handling mechanism. For usability, we need some minimal mechanism to signal an error. Theoretically, a sophisticated error handling mechanism can be implemented fully in LISP layer---e.g. we can define a special "error" value, and whenever something yields an error value, we let all the expressions that got the error value just returns it, so that the error value propagates to the final result. However, it is convenient to stop and examine the evaluator at the moment when error occurs, and such a mechanism needs access to the basement. For now, we provideERRORprocedure as another primitive. -
The previous version of
EVALdidn't have literal fuction (lambda form) inEVALcode itself---the lambda form is dealt byEVAL, but the Basement that executesEVALdoesn't need to know that. Now that, for our convenience, we use several lambda forms in the definition ofEVAL, so the Basement needs to deal with them, too.
The revised runtime Basement routines---replacing axioms.scm is
in LISP1/5/runtime.scm:
(define (CAR x) (if (null? (car x)) NIL (car x))) (define (CDR x) (if (null? (cdr x)) NIL (cdr x))) (define (CONS x y) (cons x (if (eq? y NIL) '() y))) (define (ATOM x) (if ($atom? x) T F)) (define (EQ x y) (if (eq? x y) T F)) (define (CALLSUBR subr args) (apply subr args)) (define (ERROR obj) (error "Meta*LISP Error:" ($lisp->scheme obj))) (define T T) (define F NIL) (define NIL NIL)
(define-syntax LAMBDA lambda) (define-syntax QUOTE (syntax-rules () [(_ x) ($scheme->lisp 'x)])) (define-syntax COND (syntax-rules (=>) [() NIL] [( (test expr) . more) (let ([t test]) (if (or (eq? t NIL) (eq? t F)) (COND . more) expr))] [(_ (test => proc) . more) ; extension (let ([t test]) (if (or (eq? t NIL) (eq? t F)) (COND . more) (proc t)))]))
(define-syntax $TOPLEVELS (syntax-rules ($=) [(_ ($= (name args ...) expr) ...) (begin (define name (let ([lsym ($scheme->lisp 'name)] [lfn ($scheme->lisp '(LAMBDA (args ...) expr))]) (set! (cdr lsym) `(,($scheme->lisp 'EXPR) ,lfn ,@(cdr lsym))) (lambda (args ...) expr))) ...)]))
When we read the definition of new EVAL in Gauche, the literals
are passed as Gauche's liteals. We treat it as LISP1.5 literals,
hence our QUOTE form in the basement translates Gauche literals
to LISP1.5 literals by $scheme->lisp.
The COND is also enhanced to handle our extension.
The $TOPLEVELS now not only defines Gauche procedures,
but also registers the defined form to the LISP1.5 symbol's EXPR
property. That is, if we load this M-expression on top of the
Basement:
caar[x] = car[car[x]]
It defines (1) a Gauche procedure CAAR, and (2) it registers EXPR
property of LISP1.5 symbol CAAR with S-expression
(LAMBDA (X) (CAR (CAR X))).
The runtime.scm also defines SUBR property of
several symbols that are used as primitives:
(define-syntax defglobal (syntax-rules () [(_ var key val) (let1 lsym ($scheme->lisp 'var) (set! (cdr lsym) `(,($scheme->lisp key) ,val ,@(cdr lsym))))]))
(defglobal CAR 'SUBR CAR) (defglobal CDR 'SUBR CDR) (defglobal CONS 'SUBR CONS) (defglobal ATOM 'SUBR ATOM) (defglobal EQ 'SUBR EQ) (defglobal ERROR 'SUBR ERROR) (defglobal CALLSUBR 'SUBR CALLSUBR)
=== Revised EVAL with global environment:
Now we write EVAL that understands the global environment.
First we need a couple of auxiliary procedures.
Previously, we used assoc to search local bindings in the
local environment. We didn't consider an error there, so if you
use undefined variable it yielded Gauche error. The following sassoc
takes an extra thunk (a procedure with no arguments) and invokes it
when the item x isn't found in an associative list a:
sassoc[x;a;thunk] = [null[a] → thunk[]; equal[caar[a];x] → car[a]; T → sassoc[x;cdr[a];thunk]]
Another function is to get the property value with the key y in
the symbol x. As we explained above, we access symbol's property list
just using car and cdr:
get[x;y] = [null[x] → NIL; eq[car[x];y] → cadr[x]; T → get[cdr[x];y]]
Note: Since a property list alternates keys and values, it must loop
with cddr---skipping a value. However, LISP1.5 Programmer's Manual
lists the above code, just looping with cdr. I don't know if it
was just an overlook or they just reused exisitng get procedure.
One consequence of that choice is that we can't simply store the
symbol's global value as APVAL's value, since if that value happens
to be a symbol such as EXPR, it'll confuse get procedure.
So the APVAL's value is wrapped with a list.
Now, we can write revised APPLY and EVAL that maintaines
the global environment in symbols' propetry lists:
apply[fn;args;a] = [null[fn] → NIL; atom[fn] → [get[fn;EXPR] ⇒ λ[[e];apply[e;args;a]]; get[fn;SUBR] ⇒ λ[[s];callsubr[s;args]]; T → apply[cdr[sassoc[fn;a;λ[[];error[A2]]]];args;a]]; eq[car[fn];LABEL] → apply[caddr[fn];args;cons[cons[cadr[fn];caddr[fn]];a]]; eq[car[fn];FUNARG] → apply[cadr[fn];args;caddr[fn]]; eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];args;a]]; T → apply[eval[fn;a];args;a]]
eval[form;a] = [null[form] → NIL; atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]]; T → cdr[sassoc[form;a;λ[[];error[A8]]]]]; eq[car[form];QUOTE] → cadr[form]; eq[car[form];FUNCTION] → cons[FUNARG;cons[cadr[form];cons[a;NIL]]]; eq[car[form];COND] → evcon[cdr[form];a]; atom[car[form]] → [get[car[form];EXPR] ⇒ λ[[e];apply[e;evlis[cdr[form];a];a]]; get[car[form];FEXPR] ⇒ λ[[f];apply[f;cons[cdr[form];cons[a;NIL]];a]]; get[car[form];SUBR] ⇒ λ[[s];callsubr[s;evlis[cdr[form];a]]]; get[car[form];FSUBR] ⇒ λ[[f];callsubr[f;cons[cdr[form];cons[a;NIL]]]]; T → eval[cons[cdr[sassoc[car[form];a;λ[[];error[A9]]]]; cdr[form]]; a]]; T → apply[car[form];evlis[cdr[form];a];a]]
This is pretty close to what is shown in the Appendix B of "LISP1.5 Programmer's Manual".
Note that We have a couple of calls of ERROR. The argument is an
error code.
error[A2]: A symbol is used as a procedure but doesn't have a binding.error[A8]: A symbol is used as a variable but doesn't have a binding.error[A9]: A symbol is used as a procedure or syntax but doesn't have a binding.
Another curious point. Check the atom branch of eval:
atom[form] → [get[form;APVAL] ⇒ λ[[v];car[v]]; T → cdr[sassoc[form;a;λ[[];error[A8]]]]];
It first accesses symbol's APVAL property, then
searches the environment. That is, symbol's global value takes
precedence from local values.
You can run this version of EVAL by loading
mx/genv.mx into Gauche.
Note that EVAL now accepts LISP1.5 data, and returns
LISP1.5 data. You have to convert them to Gauche's data back and forth.
Here, we just evaluate the global variable F, whose value is
NIL:
gosh> ($lisp->scheme (EVAL ($scheme->lisp 'F) ($scheme->lisp '()))) NIL
The following code defines REVERSE function locally and calls it.
Note that null and append are already stored as EXPR property
of those symbols when we loaded genv.mx, so we don't need
to provide them in the environment:
gosh> ($lisp->scheme
(EVAL ($scheme->lisp '#,(m-expr "reverse[(A B C D E F G)]"))
($scheme->lisp
'((REVERSE . #,(m-expr "lambda[[xs];
[null[xs] -> NIL;
T -> append[reverse[cdr[xs]];cons[car[xs];NIL]]]]"))))))