cf
cf copied to clipboard
a simple concatenative programming language written in C++
cf programming language
cf is a simple concatenative programming language written in C++. It is based on the languages XY [1] and F [2] written by Stevan Apter.
It originally started out as an implementation of F, written in C, hence the name 'cf'. I later moved to C++ and changed things a bit to try out some ideas.
This isn't really intended to be a usable language in itself. The code is meant to be hacked and modified to add features and make it easy to try out concatenative language ideas.
[1] http://nsl.com/k/xy/xy.htm [2] http://nsl.com/k/f/f.htm
License
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Building
You'll need the following:
- A C++ compiler. I used gccc 4.3.3 to build it and it worked fine.
- libgmp [3] for 'big numbers'
- Boost [4] for various library features I use.
- gc C++ garbage collector [5]
'cf' uses 'gc', a C++ garbage collector. The git repository for 'cf' has 'gc' included as a submodule. When first getting the 'cf' source code via git you should do the following:
$ git submodule init $ git submodule update
To build 'cf' run 'make'. This produces 'cf', 'testcf' and 'leakcf' executables. Run 'cf' to get a simple command line repl and 'testcf' to run some tests to see if things are working. 'leakcf' will run, load some test files, execute a garbage collection and exit. It's useful for testing for leaks under valgrind.
[3] http://gmplib.org/ [4] http://www.boost.org/ [5] http://github.com/doublec/gc
Running
Run the 'cf' command. It optionally taks a list of filenames as arguments. These files are loaded, in order, and evaluated. For example, 'prelude.cf' provides some standard library functions. I recommend using rlwrap, or similar, to get command line editing:
$ rlwrap ./cf prelude.cf Loading prelude.cf -> ok
Data entered at the 'ok' prompt is pushed onto the stack, and pressing enter evaluates the program:
ok 25 5 + fac. 265252859812191058636308480000000 -> ok
'fac' is the factorial function implemented in prelude.cf.
Quick Overview
I'll add more detailed information here later, for now reading the information in [1] and [2] should enable a quick overview.
Basically the running system is composed of an environment, E, a stack X, and a queue Y. A 'program' is a series of functions and literals entered at the prompt as above.
When a program is evaluated, each item is prepended to the queue Y. The system then runs, taking the item at the front of the queue and evaluating it.
Literals (Numbers, Strings, Symbols) are immediately pushed onto the stack X. Primitives and shuffle patterns are executed immediately and the results can affect E, X or Y.
Primitives are the only symbols that are immediately executed. Symbols can be set to values and the value is obtained using the ';' primitive. If this value is a list it can then be further evaluated as a program using the '.' primitive. If '.' is used on a symbol then it's value will be obtained (if it has one) and '.' called again on that value. This makes the use of ';' optional in a lot of common cases.
Comments are anything between ** and **. For example:
ok [1 2 +] myfunc set ** set the 'myfunc' symbol to the list 1,2 and + ** ok myfunc ** push the 'myfunc' symbol on the stack ** ok ; ** get the value of it ([1 2 +]) ** ok . ** execute the list as a program ** ok println ** print the result ** => 3
Literate Files
A form of literate programming is supported, based on that of Literate Haskell: http://haskell.org/haskellwiki/Literate_programming
Any file that ends with the extension .lcf is a literate cf file. All lines starting with '>' are treated as code, everything else is a comment. For example:
"The sum of 1 + 2 is " write 1 2 + .
To facilitate use in LaTeX documents, all code between \begin{code} and \end{code} are also treated as code:
\begin{code} "This is evaluated as code" . \end{code}
See the examples in literate.lcf. They can be run with:
$ ./cf prelude.cf literate.lcf
Primitives
Look up cf.cpp in the XY() constructor for the list of primitives. Currently they are:
-
- Addition
-
- Subtraction
-
- Multiplication
% - Division ^ - Power _ - Floor set - set a symbol to a value ; - get the value of a symbol . - Execute a program on the stack ) - Process a pattern and store the result on the stack X ( - Process a pattern and prepend the result to the queue Y ` - dip operator | - Reverse a list ' - quote $ - stack operator $$ - stackqueue operator = - equals < - less than <= - less than or equal to
- greater than
= - greater than or equal to not - if true, return false and vice versa. @ - retrieve nth item from a list (2 [1 2 3 4] @ => 3) . - print the topmost item on the stack print - as '.' but don't do a newline write - write the topmost item on the stack count - return the length of the list tokenize - given a string, return a list of the cf tokens. parse - given a list of tokens, return a list of cf objects getline - get a line of input from the user millis - returns number of milliseconds since 1970/1/1. enum - given a number, returns a list of elements from 0 to n-1. clone - creates a copy of the object on the stack to-string - leaves a string representation of the object on the stack split - ( string seperators -- seq ) splits a string sdrop - ( seq n -- seq ) removes n items from the sequence stake - ( seq n -- seq ) returns a sequence with the first n elements foldl - ( seq seed q -- seq ) left fold foldr - ( seq seed q -- seq ) right fold if - ( bool then else -- ) ? - ( seq elt -- index ) find gc - ( -- ) Perform garbage collection
Numbers can be floats or integers. Integers can be of any length. For example:
ok 1000 fac. println ...a really big number...
Shuffle patterns can be used to move things around on the stack. These are templates for stack operations:
ok 1 2 3 abc-cba => 3 2 1
There is Pattern support as per F [2]. Patterns, ( and ) operate the same as in F as far as I know:
ok [1 2 [3 4 5]] [ [a b [c C]] c C [a b] ] ) => 3 [4 5] [1 2]
$ works the same as in F [2]. It takes a program on the stack. That program will be evaluated with X and Y on the stack. It should return a new X and Y. This powerful operator can be used to do anything, including modify the future of the computation - it can implement partial and full continuations.
$$ is a helper function for $. It takes and X and Y on the stack and replaces X and Y with these.
` is dip. For example:
Joy/Factor: 1 2 3 [ - ] dip => -1 3 cf: 1 2 3 [-]`
' is the quote operator. It takes the next item on the Y queue, puts it into a one elements list and pushes it on the X stack.
1 2 '+ => 1 2 [+]
, is the join operator. It appends things to lists. Given two lists they are appended. Given two non-lists a two element list is made containing them. Given a list and a non-list, the non-list is added to the list. eg:
[1 2] [3 4], => [1 2 3 4] 1 [2 3 4], => [1 2 3 4] [1 2 3] 4, => [1 2 3 4] 1 2, => [1 2]
Non-Primitives
See 'prelude.cf' for the current non-primitives. Some of these include:
dup - 1 dup. => 1 1 drop - 1 2 drop. => 1 swap - 1 2 swap. => 2 1 uncons - [1 2 3] uncons. => 1 [2 3] cons - 1 [2 3] cons. => [1 2 3] first - [1 2 3] first. => 1 rest - [1 2 3] rest. => [2 3] clear - Clears the X and Y stacks. Useful at the REPL. queue - Move the top stack item to the tail of the queue. >r in forth/factor unqueue - Move the tail of the queue to the top of the stack. r> in forth/factor stack - push a list cotaining stack contents on the stack unstack - replace the stack with the contents of the list on the top of the stack cond - conditional: 5 [0 >] [ drop. "is true" ] [drop. "is false"] cond. while - 1 [10 <] [1 + a-aa .] while. do - 10 [ "hello" println ] do. map - [1 2 3] [1 +] map. => [2 3 4] each - [1 2 3] [println] each. fold - [1 2 3] 0 [+] fold. => 6 unfold - 1 [10 >] [] [1 +] unfold => [1 2 3 4 5 6 7 8 9 10] cleave - 1 [1 +] [2 +] cleave => 2 3 time - [100 fac.drop.] time. => 7 fac - 5 fac. => 120 fac2 - same as 'fac' but uses a fold/unfold and is much slower. match - ( regexp string -- seq ) matches string against the regular expression, returning a sequence of matches. REPL
repl.cf contains a tiny repl implemented in cf. It calls the 'tokenize' and 'parse' primitives to evaluate the user input and prints the X and Y stack/queue after each line. Note that the Y queue will show the 'remaining computation' of the repl itself. Run with:
$ ./cf prelude.cf repl.cf ok repl. Y: prompt . getline [ "bye" = ] [ drop . ] [ tokenize parse . repl . ] cond . X: repl>
Exit using 'bye'.
Other Stuff
A bunch of useful primitives remain to be written. I want to implement many of them in the language itself.
I find needing to do the ';' after functions to get their value a bit annoying but I'm interested in seeing what can be done with this. This idea comes from the False language mentioned in the F documentation.
Combinators are generally written by building lists of programs to do the work. For example, the definition of 'each' from the prelude:
[[[]]swap;unit.
,uncons;unit.`,while.drop.] each set
This builds a while expression by taking the definition of 'uncons' and 'swap' and prefixing them to the program given by the user. So the translation looks like:
[1 2 3] [println] each. => [1 2 3] [] [uncons.swap.] while.
'fold' works similar it translates like this:
[1 2 3] 0 [+] fold => [0 1 2 3 + + + ].
Experimental Object System
cf includes an experimental protoype based object system. This is still being designed and written so it is likely to change quite a bit. The object system is based on that in the Self programming language.
cf objects can have named slots. Each slot refernces another cf object. Messages are symbols. The value of a slot is obtained using the 'lookup' primitive:
lookup ( object message -- object list )
The result of lookup is the original object and a list containing code that when unquoted will send the message to the object. The list can be unquoted with '.' as normal. So 'lookup .' is to objects as 'foo ;.' is to standard cf lookup and execution of functions.
Slots can be data or method slots. A data slot holds an initial value and when a message is sent to it retrieves the value. Data slots are added with 'add-slot':
add-slot ( object value name -- object )
object copy 42 foo add-data foo lookup . => 42 object copy 42 foo add-data 25 over. foo: lookup . foo lookup . => 25
There is also an 'add-ro-slot' which adds a read only slot. No setter will be added for this type of slot.
Parent slots have a name ending in '*'. If a message is not found when looked up in an object then all that objects parent slots are also searched for the message. This is how the prototype object system handles inheritance:
object copy 42 foo add-slot object copy swap. parent* add-slot foo lookup . => 42
Method slots take a list representing the program to run when the message is sent. When a message is sent that results in a method slot a frame object is created. This frame object has a parent slot called 'self' that points to the original object holding the method. Slot values of this original object can be retrieved by accessing this frame object.
add-method ( object args code name -- object ) frame ( -- object )
object copy [] [ frame message lookup . print ] display add-method object copy "hello" message add-slot swap. parent* add-slot display lookup . => "hello"
There is an 'object' primitive that returns the prototypical object. The 'copy' primitive returns a shallow copy of an object.
object ( -- object ) copy ( object -- object' )
The get and unquote primitives have been overloaded to work with the object system. If ';' is called on a symbol, and that symbol is not found in the environment, then it will be looked up in an object on the stack using 'lookup':
object copy 25 foo add-slot foo;. => 25
If '.' is called on a symbol, and that symbol is not found in the environment, then it will be looked up in the frame object using 'lookup'. This makes '.' similar to Self's implicit self sends:
object copy 25 foo add-slot [] [ foo. println ] print-foo add-method print-foo;. => 25
This works with arguments to methods too:
object copy [msg] [ msg. println ] print-msg add-method "hello" swap. print-msg;. => "hello"
Troubleshooting
Error handling started of as being C++ assertions so a lot of error handling still aborts with a fatal error.
Most errors that occur as a result of interpreting (programming errors, stack underflows, type errors, etc) cause the interpreter drop back to the repl with an error object on the stack. The error object has the following elements:
1: The 'error' symbol 2. The error message 3. The stack at the time of the error 4. The queue at the time of the error
From the repl you can manually examine the stacks, modify them and restart execution. The following can be used to restart given the stack and queue:
ok [ ab- ] $
Why?
Why write this when concatenative languages like Factor exist that are full featured with lots of libraries?
I wanted something small to play around with different ideas. Like a programming puzzle to find different implementations of prelude functions in different styles.
I wanted a C/C++ program that used small amounts of memory so I could run it on embedded devices like a phone.
Feedback
I host this on githib: http://github.com/doublec/cf/tree/master
Feel free to clone, hack away, suggest patches, etc. I can be reached via email: [email protected]