gocool icon indicating copy to clipboard operation
gocool copied to clipboard

Go implementation of the Cool programming language

  • gocool

#+ATTR_HTML: title="No Maintenance Intended" [[http://unmaintained.tech/][file:http://unmaintained.tech/badge.svg]]

Go implementation of the [[http://theory.stanford.edu/~aiken/software/cool/cool.html][Cool programming language]].

This repository contains a Golang implementation of the homeworks for the [[https://www.coursera.org/course/compilers][Coursera compilers class]].

  • Parts

The compiler is divided into four parts, corresponding to the four programming homework assignments: lexing, scanning, typechecking, and compilation. The four shell scripts testlexer.sh, testparser.sh, testtypes.sh, and testcgen.sh test each of these four parts. The test cases are obtained by running the "grading.pl" script for each homework, and copying the test input and output. Minor changes have been made, as advised in the homework instructions, typically with regard to exact line numbers, although an effort has been made to match the examples.

** Lexer The lexer is a simple hand-written lexer, defined in parser/lex.go. It is a modified version of the [[http://golang.org/src/pkg/text/template/parse/lex.go][go template lexer]], copied and modified. For more background, read or watch Rob Pike's excellent talk, [[https://google.com/search?q=lexical%20scanning%20in%20go][Lexical Scanning in Go]].

** Parser The parser is a [[http://dinosaur.compilertools.net/yacc/][yacc]]-generated LALR(1) parser. Similarities to [[https://code.google.com/p/rsc/source/browse/cc/][Russ Cox's C parser]] are due to the fact that it was frequently studied while wrestling with yacc. The AST classes in ast.go were defined by hand, and parser/dump.go contains functions that print them in the format used by the C++/Java Cool scaffolding code provided in the course materials.

** Typechecking There's nothing interesting about the typechecking code: it was simply implemented bit by bit until all the tests passed. It does a bit of work (eg. symbol table generation) that could be considered part of code generation.

** Compilation The compiler (main code in cgen/cgen.go and cgen/expr.go) is about as simple as possible: a stack machine with an accumulator ($a0), and "self" saved in $s0 for easy reference. There are just a couple of things not suggested in the homework assignment: the "isa" tag-checking routine, and the embedding of prototype object addresses and initialization functions into the dispatch tables.

No effort was made to do any optimizations or to use more registers.

testcgen.sh runs all the tests three times: once with no garbage collection, once with garbage collection turned on, and once with both garbage collection and collect-after-every-allocation debugging turned on. If you set the environment variable NOSLOW before running it, it will skip the third phase.

  • Supplementary notes ** MIPS registers |----------------+--------------+------------------------------------+-------------------| | Number | Name | Function | Callee preserves? | |----------------+--------------+------------------------------------+-------------------| | $0 | zero | always has value zero | | | $1 | at | assembler temporary | | | $2-$3 | v0,v1 | function results. Can be temporary | | | $4-$7 | a0,a1,a2,a3 | function arguments | | | $8-$15,$24,$25 | t0,...,t7,t9 | temporary registers | | | $16-$23 | s0,...,s7 | saved registers | Yes | | $26,$27 | k0,k1 | Reserved for OS kernel | | | $28 | gp | global pointer | Yes | | $29 | sp | stack pointer | Yes | | $30 | fp | frame pointer | Yes | | $31 | ra | return address | | |----------------+--------------+------------------------------------+-------------------|

** Code generation Stack machine with accumulator.

Accumulator: $a0: holds result of last expression evaluation. $s0 holds a stored copy of "self". $sp points to next location in stack (so $sp+4 is the top of the stack) $t0, $t1, $t2 are temporary registers.

** Activation Record

  • Return address
  • Old frame pointer
  • n arguments
  • NT(e) temporary locations

| arg_1 | | | ⋮ | | | arg_n | | |----------------+-----------------| | Old FP | <-- SP on entry | | Return address | | | Temp NT(e) | | | ⋮ | | | Temp 1 | <-- New FP | | | <-- New SP |

** Object layout

| -4 | FFFF - garbage collector tag | | 0 | class tag | | 4 | object size (in 32-bit words) | | 8 | dispatch pointer | | 12 | attributes | | ⋮ | ⋮ |

Strings:

| 12 | Length: pointer to an Int | | 16 | Characters, nul-terminated, zero-padded to next word | | ⋮ | ⋮ |