LITE icon indicating copy to clipboard operation
LITE copied to clipboard

LISP INTERPRETER and TEXT EDITOR

#+title: LISP INTERPRETER AND TEXT EDITOR #+author: Rylan Lens Kellogg #+description: LITE is a lisp interpreter and text editor built in C. #+created: <2022-05-26 Thu> #+options: toc:nil

** LITE

LITE is an extensible text editor with a built-in, dynamic programming language to alter the editor's functionality: LITE LISP.

LITE may mean anything that abbreviates the letters; listed here are a few that are especially significant.

  • LISP INTERPRETER and TEXT EDITOR
  • LITE IS TOTALLY EMACS
  • LISP IMBUED with TONS of ECCENTRICITIES

The goal of LITE is to create a modern, cross-platform tool for text editing and general scripting that is easy to port.

Initially, development followed [[https://www.lwh.jp/lisp/][this LISP interpreter tutorial]] ([[https://web.archive.org/web/20220617192957/https://www.lwh.jp/lisp][archive]]).

One glaring difference between Emacs and LITE is that LITE does not use a [[https://en.wikipedia.org/wiki/Gap_buffer][Gap Buffer]] data structure to edit text; rather, it uses a [[https://en.wikipedia.org/wiki/Rope_(data_structure)][Rope]].

NOTE: Any and all shell commands assume a working directory of the base of this repository.

*** Usage

If you have the LITE executable, run it with the ~--help~ argument for usage information.

**** LITE GFX

By default, LITE is compiled with a GFX backend, resulting in an executable GUI application.

Run it the way that you would normally run a graphical application on your OS. Most of the time this means double-clicking the executable from a file explorer, or launching the executable from a shell.

NOTE: For ongoing development, it is best to run builds of LITE with a working directory of this repository, so it can find the most up-to-date LISP sources, fonts, etc.

The executable may be obtained by building from local source.

or [[https://github.com/LensPlaysGames/LITE/releases/latest][downloading]] the latest pre-built release.

**** LITE REPL

LITE may be built as a terminal-only "read, evaluate, print, loop" program (a /REPL/).

To enter the REPL, run the following: #+begin_src shell ./bin/LITE #+end_src

On Windows: #+begin_src powershell .\bin\LITE #+end_src

Any and all arguments following a ~--~ argument are treated as file paths and are attempted to be loaded as LITE LISP source files.

*** Building

[[https://cmake.org/][CMake]] is used as the cross-platform build system.

To see a list of available build systems, use the following command: #+begin_src shell cmake -G #+end_src

I recommend [[https://www.ninja-build.org][Ninja]], it is modern, fast, and designed for use in conjunction with CMake.

First, generate an out-of-source build tree: #+begin_src sh cmake -G -B bld -DCMAKE_BUILD_TYPE=Release #+end_src Replace ~~ with one of the available build systems

Optionally, build LITE with graphical user interface capabilities by adding the flag ~-DLITE_GFX=ON~ when generating the build tree. Ensure to read the README in the ~gfx~ subdirectory, if doing so.

Once the build tree is generated, invoke it to generate an executable. #+begin_src sh cmake --build bld #+end_src

If no fatal errors occur, the ~bin~ subdirectory of the repository will be populated with the LITE executable.

In order for LITE to find the standard library at runtime, it is necessary to install LITE. This creates a directory in which LITE will look during runtime for certain resources like fonts or lisp. On Linux, this is located at ~$HOME/lite~. On Windows, it's at ~%APPDATA%/lite~.

#+begin_src sh cmake --install bld #+end_src

** LITE LISP

LITE LISP is the language that LITE interprets.

*** Basic Syntax

Every LISP program is made up of /atoms/. An atom is nearly any sequence of bytes, except for whitespace, commas, or backslashes. Atoms are NOT case sensitive.

Here are some valid LITE LISP atoms. #+begin_example foo-bar foo/bar foobar <-/im-an-atom/-> -420 69 interrupt80 #+end_example

A /list/ is a sequence of atoms and/or other lists surrounded by parentheses. #+begin_example (foo-bar foo/bar foobar) ( im a list (in a list )) (sun mon tue wed thu fri sat) #+end_example

Strings are any sequence of bytes surrounded by double quotes. #+begin_example "I am a string." " al;sdn (!^(%)#!^) \033 \n lasnk \ a \a \t \f " "Information is easy to fake but hard to smell." #+end_example

Comments begin with a semi-colon and stop at the first newline. #+begin_example ; I'm a comment ;;;; And I as well!

(print "hello, friends!") ; print to stdout #+end_example

Function calls are represented as a list with a symbol as the first element, and any arguments passed are subsequent elements. #+begin_example (print "hello friends!") (abs -69420) (define foo 42) #+end_example

The first element in a list that is to be evaluated is referred to as the ~operator~.

*** Atoms

Every object in LISP is called an ~Atom~. Every Atom has a type, a value, a docstring, and a generic allocation pointer associated with it.

The value is a union with multiple value types, and the type field designates which value within the union to use, and how to treat it.

The docstring is a string containing information about the atom, i.e. /documenting/ it. \ This could range from a function's usage to a variables meaning. \ Access docstrings using the docstring special form: ~(docstring )~.

The generic allocation pointer is a linked list of allocated memory that may be freed when the atom is garbage collected. This allows the LITE interpreter to allocate memory as needed and ensure it is freed /after/ using it.

*** Types

Here are the different types an Atom may have in LITE LISP:

  • Nil :: This is the definition of false, nothing, etc.

  • Pair :: A recursive pair, containing a left-hand Atom and a right-hand Atom.

    A pair has special terminology for the two sides; the left is referred to as ~car~, while the right is referred to as ~cdr~.

    A list is a pair with a value on the left, and another pair, or nil, on the right.

  • Symbol :: A sequence of bytes that may be bound in the environment.

    All symbols are located in the /symbol table/ with no duplicates.

  • String :: A sequence of bytes, usually denoting human readable text.

  • Integer :: An integer number, like ~1~, ~-420~, or ~69~.

  • BuiltIn :: A function implemented in LITE source code that is able to be called from LITE LISP.

  • Closure :: A function implemented in LITE LISP; a lambda.

  • Macro :: A closure with unevaluated arguments that creates an expression that is then evaluated.

  • Buffer :: An opened file that may be edited in LITE.

*** Environment, Variables, and QUOTE

Variables are stored in an /environment/. An environment is a key/value dictionary, where the keys are a symbol, and the values are atomic LISP objects. When evaluating a symbol, it is first checked if there is a binding in any accessible environment. If so, that value is used in place of the symbol, when evaluated.

To bind a symbol to a value in the local environment, use the ~DEFINE~ special form. #+begin_src lisp (define new-variable 42) #+end_src

NOTE: ~DEFINE~ will first attempt to find the symbol in any parent environment; if found, it will override that binding's value instead of creating a new one in the immediate environment. This allows for ~DEFINE~ to set the value of parameters, ~LET~ arguments, etc.

To bind a symbol to a value in the global environment, use the ~SET~ special form. #+begin_src lisp (set new-variable 42) #+end_src

~new-variable~ is now a symbol bound in the environment. Following occurences of the bound symbol will be evaluated to the defined value, ~42~.

Sometimes, it is useful to not evaluate a variable. This can be done using the ~QUOTE~ operator. #+begin_src lisp (quote new-variable) ; returns the symbol "new-variable" #+end_src

As quoting is a very common necessity in LISP, there is a special short-hand for it: a preceding single-quote. This short-hand means the following to be equivalent to the ~QUOTE~ just above. #+begin_src lisp 'new-variable ; returns the symbol "new-variable" #+end_src

When defining any variable, it is possible to define a docstring for it by specifying it as a third argument: #+begin_src lisp (define new-variable 42 "The meaning of life, the universe, and everything.") #+end_src

The docstring may be accessed with a builtin, like so: #+begin_src lisp (docstring new-variable) #+end_src

The standard library includes a macro to help re-define a docstring: #+begin_src lisp (set-docstring new-variable "The meaning of your mom.") #+end_src

This allows for everything in LITE LISP to self-document it's use.

*** Functions

The standard library includes the ~DEFUN~ macro to help define named functions. #+begin_src lisp (defun NAME ARGUMENT DOCSTRING BODY-EXPRESSION(S)) #+end_src

Here is a simple factorial implementation that works for small, positive numbers: #+begin_src lisp (defun fact (x) "Get the factorial of integer X." (if (= x 0) 1 (* x (fact (- x 1))))) #+end_src

To call a named function, put the name of the function in the operator position, and any arguments following. Arguments are evaluated before being bound and the body being executed. #+begin_src lisp (fact 6) #+end_src

Assuming ~FACT~ refers to the function defined just above, this would result in the integer ~720~, as ~6~ was bound to the symbol ~X~ during the execution of the functions body.

As arguments are evaluated before being bound, we can also pass expressions. The result of the expression will be bound to the argument symbol. #+begin_src lisp (fact (fact 3)) #+end_src

In this case, =(fact 3)= will be evaluated before the outer ~FACT~ call, so that we can bind the result of it to ~X~. Once evaluating, we will get the integer result ~6~, which will then be bound to ~X~ in the outer (left-most) ~FACT~ call, resulting in ~720~.

**** Lambda/Closure

A lambda is a function with no name.

Currently, lambdas may be defined with the following special form: #+begin_src lisp (lambda ARGUMENT BODY-EXPRESSION(S)) #+end_src

ARGUMENT is a symbol or a list of symbols denoting arguments to be bound when the function is called.

BODY-EXPRESSION(S) is a sequence of expressions that will be executed with arguments bound when the lambda is called. The result of the last expression in the body is the return value of the lambda.

This means the identity lambda may be written like so: #+begin_src lisp (lambda (x) x) #+end_src

As a real world example, here is the factorial implementation from above written as a lambda: #+begin_src lisp (lambda (x) (if (= x 0) 1 (* x (fact (- x 1))))) #+end_src

To call a lambda, put it in the operator position just like the name of a named function. Pass any arguments as subsequent values in the list, just as you would a named function. #+begin_src lisp ((lambda (x) (if (= x 0) 1 (* x (fact - x 1)))) 6) #+end_src

Evaluating the above would result in the integer value ~720~, as ~6~ was bound to ~X~ and the lambda body was executed.

**** Variadic Arguments

There is also support for variadic arguments using an /improper list/. The syntax for an improper list is as follows: : (1 2 3 . 4)

In the context of a lambda, here is how to define a function with two positional arguments followed by a varying number of arguments. #+begin_src lisp (lambda (argument1 argument2 . the-rest) BODY-EXPRESSION(S)) #+end_src After all fixed arguments are given, the rest are passed as a list to the function. If no variadic arguments are given, nil is passed.

To create a function that may take any amount of arguments, put a symbol in the ARGUMENT position, as seen in this re-definition of the ~+~ operator in the standard library: #+begin_src lisp (let ((old+ +)) (lambda ints (foldl old+ 0 ints))) #+end_src

*** Macros

A macro may be created with the ~MACRO~ operator. A macro is like a lambda, except it will return the result of evaluating it's return value, rather than it's return value being the result. This allows for commands and arguments to be built programatically in LISP.

In order to ease the making of macros, there is /quasiquotation/. It is similar to regular quotation, but it is possible to unquote specific atoms so as to evaluate them before calling the returned expression.

While it is possible to call the quasiquotation operators manually, there are short-hand special forms built in to the parser.

  • '`' -- QUASIQUOTE
  • ',' -- UNQUOTE
  • ',@' -- UNQUOTE-SPLICING

These special forms allow macro definitions to look more like the expressions they produce.

A simple example that mimics the ~QUOTE~ operator: #+begin_src lisp (macro my-quote (x) "Mimics the 'QUOTE' operator." `(quote ,x)) #+end_src

The QUASIQUOTE special-form at the beginning will cause the QUOTE symbol to pass through without being evaluated. The UNQUOTE special-form before the ~X~ symbol will cause it to be evaluated, replacing ~,x~ with the passed argument.

For example, calling ~(my-quote a)~ will eventually expand to ~(QUOTE A)~, which will result in the symbol ~A~ being returned upon evaluation.

For a more real-world example that is actually useful, let's take a look at ~DEFUN~ from the standard library. #+begin_src lisp (macro defun (name args docstring . body) "Define a named lambda function with a given docstring." `(define ,name (lambda ,args ,@body) ,docstring)) #+end_src

As you can see, this macro takes 3 fixed arguments followed by any number of arguments following passed as a list bound to ~BODY~. The first argument, name, is within a quasiquoted expression, but contains an unquote special-form operator. This causes it to be evaluated during macro expansion, resulting in the passed argument. The same thing happens with ~ARGS~ and ~DOCSTRING~. When it comes to ~BODY~, though, things change. As ~BODY~ is a list, and a function body is not a list, but a sequence, we must transform it somehow. This is where the ~UNQUOTE-SPLICING~ operator comes into play, as it will take each element of a given list and splice it into a sequence. #+begin_example ,BODY = ((print a) (print b) (print c)) ,@BODY = (print a) (print b) (print c) #+end_example

This allows the ~LAMBDA~ body argument to be a valid sequence of expressions that can be evaluated properly.

When including the standard library, ~DEFMACRO~ operates exactly the same as ~MACRO~.

When the environment variable ~DEBUG/MACRO~ is non-nil, extra output concerning macros is produced.

*** Special Forms

Special forms are hard-coded symbols that go in the operator position. They are the most fundamental building blocks of how LITE LISP operates.

Here is a list of all of the special forms currently in LITE LISP.

  • QUOTE :: Pass one and only argument through without evaluating it.

    There is also a short-form built in to the parser: ~'~ (single quote). This allows code to be written much faster, as quoting is something that happens quite often in the land of LISP. : 'X == (QUOTE X)

  • DEFINE and SET :: Bind a symbol to a given atomic value within the LISP environment.

    ~(DEFINE SYMBOL VALUE [DOCSTRING])~

    ~(SET SYMBOL VALUE [DOCSTRING])~

    ~DEFINE~ first checks all parent environments for a binding of the symbol, and will override that one if it finds it. If ~SYMBOL~ is not bound in any parent environment, ~DEFINE~ binds it in the local environment. That is, the environment ~DEFINE~ was called from.

    ~SET~ only operates on the global environment. This environment is the top level environment that is carried between evaluations, whereas local environments tend to go away after evaluation completes.

  • LAMBDA :: Create a closure from the given expected arguments and body.

    ~(LAMBDA ARGS BODY)~

    This is an expression which returns a closure. A closure is just like a function, except that it retains a pointer to the environment that it was created within, allowing any variable accesses to be resolved as expected.

    This closure can be placed directly in the operator position and called. Any arguments following the operator position are arguments to the given operator. An error will be reported if the number of arguments does not match, unless making use of an improper list to gather all remaining arguments into one.

    ~((identity (x) x) 42)~

  • IF :: A conditional expression.

    ~(IF CONDITION THEN OTHERWISE)~

    Evaluate the given condition. If result is non-nil, evaluate the second argument given. Otherwise, evaluate the third argument.

  • WHILE :: A conditional loop.

    ~(WHILE CONDITION BODY)~

    Evaluate condition. If result is non-nil, evaluate BODY one time. Repeat each time body is evaluated.

    Extra information regarding ~WHILE~ loops is output when the ~DEBUG/WHILE~ debug flag is set to a non-nil value.

  • PROGN :: Evaluate sequence of expressions, returning result of last expression.

    This is mainly used within ~IF~ to be able to evaluate multiple expressions within the ~THEN~ or ~OTHERWISE~ singular expression argument.

  • MACRO :: Create a closure, except the passed arguments are not evaluated, and the value returned from the macro is evaluated, then that return value is the result.

    ~(MACRO SYMBOL ARGS DOCSTRING BODY)~

    One of the most useful features of macros is quasiquoting, which is just a fancy word meaning evaluating only some arguments while passing others through quoted. See the section on macros for more details.

  • EVALUATE :: Return the result of the given argument after evaluating it.

    This is mostly used in macros to evaluate certain arguments more than once.

  • ENV :: Return the current environment.

    The first element (car (env)) is the parent environment. If the parent environment is ~nil~, that indicates the environment is the global environment.

    NOTE: This is a copy of the environment. Changes to it are not reflected in the environment itself.

  • ERROR :: Print the given message to standard out after an error indicator. Returns the given message. Halts evaluation.

  • QUIT-COMPLETELY :: Exit LITE entirely. Shut down the program.

    Return STATUS code from program, or ~0~ if one isn't given.

    ~(QUIT-COMPLETELY [STATUS])~

    Default binding: ~CTRL-ALT-Q~

  • AND :: Return nil as soon as one of the arguments evaluates to nil. Otherwise return ~T~.

  • OR :: Return ~T~ as soon as one of the arguments evaluates non-nil. Otherwise return nil.

*** Structures

Structures are defined in the standard library, and can not be used unless it is included.

In LITE LISP, structures are basically an associative list with stricter rules.

Each association within the structure is referred to as a /member/.

Each member must be a pair with a symbol on the left side. This symbol is the member's /identifier/, or ID.

Let's look at how to define a new structure: #+begin_src lisp (defstruct my-struct "my docstring" ((my-member 0))) #+end_src

Here, we have a structure, ~my-struct~, with a single member, ~my-member~.

It should be noted that the syntax for defining members matches ~let~ exactly, at least on the surface. One important thing to note is that initial values given to members are not evaluated, and so must be a self-evaluating value (a literal). For example, attempting to put the name of a function as an initial value does not work (at least not as expected). The member will be bound to the symbol that matches the name of the function, not the function itself.

To access the value of any given member within a structure, use ~get-member~: #+begin_src lisp (get-member my-struct my-member) #+end_src

This will return the value of the member with an ID of ~my-member~ within ~my-struct~. If one does not exist, it will return nil. Because we gave the member an initial value of zero, that is what is returned.

~set-member~ can be used to update a member's value. #+begin_src lisp (set-member my-struct 'my-member 42) #+end_src

To define a member to a function, you must first define the structure. Afterwards, use ~set-member~, which evaluates the value argument: #+begin_src lisp (set-member my-struct 'my-member +) #+end_src

At this point, ~my-member~ of ~my-struct~ has a value of the closure which was bound to the symbol ~+~.

We can now call this member function using the ~call-member~ macro: #+begin_src lisp (call-member my-struct my-member 34 35) #+end_src

Any arguments after the structure symbol and member ID are passed through to the called function.

As you may already be thinking, you don't always want to use structures in the way shown above, where the actual structure definition is the mutable data. In most cases, it is preferable to define a structure once, and have multiple instances of the defined. This is possible with the ~make~ macro: #+begin_src lisp (defstruct vector3 "A vector of three integers, X, Y, and Z." ((x 0) (y 0) (z 0)))

;; Create an instance of a defined structure. (set my-coordinates (make vector3)) ;; Setting member values. (set-member my-coordinates 'x 24) (set-member my-coordinates 'y 34) (set-member my-coordinates 'z 11) ;; Print the instance of the structure to standard out. (print my-coordinates) ;; Access all the members of a struct using the ACCESS macro. ;; It is like LET, except it binds all of a structure's arguments ;; to their values, then evaluates the given body. (access my-coordinates (print x) (print y) (print z)) ;; Accessing member IDs and values as separate lists. (let ((coordinate-members (map car my-coordinates)) (coordinate-values (map cadr my-coordinates))) (print coordinate-members) (print coordinate-values)) ;; Print the sum of all of the values in the structure. (print (foldl + 0 (map cadr my-coordinates))) #+end_src

*** Misc

  • Buffer Table

    Get the current buffer table with the ~BUF~ operator.

  • Symbol Table

    Get the current symbol table with the ~SYM~ operator.

    Alternatively, visualize the environment by setting ~DEBUG/ENVIRONMENT~ to any non-nil value.

  • Closure Environment Syntax

    Currently, closures are stored in the environment with the following syntax: : (ENVIRONMENT (ARGUMENT ...) BODY-EXPRESSION)

  • Escape Sequences within Strings

    Currently, strings have a double-backslash escape sequence.

    The following escape sequences are recognized within strings:

    • ~\_~ -> nothing
    • ~\r~ -> ~\r~ (0xd)
    • ~\n~ -> ~\n~ (0xa)
    • ~\"~ -> ~"~
  • Debug Environment Variables

    There are environment variables that cause LITE to report output extra information regarding the topic the variable pertains to when non-nil.

    For a list of all debug variables that LITE internally responds to, see the file that enables all of them at once, ~lisp/dbg.lt~.