shecc
                                
                                 shecc copied to clipboard
                                
                                    shecc copied to clipboard
                            
                            
                            
                        A self-hosting and educational C optimizing compiler
shecc : self-hosting and educational C compiler

Introduction
shecc is built from scratch, targeted at 32-bit Arm and RISC-V architecture, as
a self-compiling compiler for a subset of the C language.
Features
- Generate executable Linux ELF binaries for ARMv7-A and RV32IM;
- Provide a minimal C standard library for basic I/O on GNU/Linux;
- The cross-compiler is written in ANSI C, arguably running on most platforms;
- Self-contained C language front-end and machine code generator;
- Two-pass compilation: on the first pass it checks the syntax of statements and constructs a table of symbols, while on the second pass it actually translates program statements into Arm/RISC-V machine code.
Compatibility
shecc is capable of compiling C source files written in the following
syntax:
- data types: char, int, struct, and pointer
- condition statements: if, while, for, switch, case, break, return, and general expressions
- compound assignments: +=,-=,*=
- global/local variable initializations for supported data types
- e.g. int i = [expr]
 
- e.g. 
The backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3.
Bootstrapping
The steps to validate shecc bootstrapping:
- stage0:- sheccsource code is initially compiled using an ordinary compiler which generates a native executable. The generated compiler can be used as a cross-compiler.
- stage1: The built binary reads its own source code as input and generates an ARMv7-A/RV32IM binary.
- stage2: The generated ARMv7-A/RV32IM binary is invoked (via QEMU or running on Arm and RISC-V devices) with its own source code as input and generates another ARMv7-A/RV32IM binary.
- bootstrap: Build the- stage1and- stage2compilers, and verify that they are byte-wise identical. If so,- shecccan compile its own source code and produce new versions of that same program.
Prerequisites
Code generator in shecc does not rely on external utilities. You only need
ordinary C compilers such as gcc and clang. However, shecc would bootstrap
itself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user
emulation on GNU/Linux:
$ sudo apt-get install qemu-user
It is still possible to build shecc on macOS or Microsoft Windows. However,
the second stage bootstrapping would fail due to qemu-arm absence.
Build and Verify
Configure which backend you want, shecc supports ARMv7-A and RV32IM backend:
$ make config ARCH=arm
# Target machine code switch to Arm
$ make config ARCH=riscv
# Target machine code switch to RISC-V
Run make and you should see this:
  CC+LD	out/inliner
  GEN	out/libc.inc
  CC	out/src/main.o
  LD	out/shecc
  SHECC	out/shecc-stage1.elf
  SHECC	out/shecc-stage2.elf
File out/shecc is the first stage compiler. Its usage:
shecc [-o output] [-no-libc] [--dump-ir] <infile.c>
Compiler options:
- -o: output file name (default: out.elf)
- --no-libc: Exclude embedded C library (default: embedded)
- --dump-ir: Dump intermediate representation (IR)
Example:
$ out/shecc -o fib tests/fib.c
$ chmod +x fib
$ qemu-arm fib
shecc comes with unit tests. To run the tests, give "check" as an argument:
$ make check
Reference output:
...
int main(int argc, int argv) { exit(sizeof(char)); } => 1
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } => 10
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } => 10
OK
Intermediate Representation
Once the option --dump-ir is passed to shecc, the intermediate representation (IR)
will be generated. Take the file tests/fib.c for example. It consists of a recursive
Fibonacci sequence function.
int fib(int n)
{
    if (n == 0)
        return 0;
    else if (n == 1)
        return 1;
    return fib(n - 1) + fib(n - 2);
}
Execute the following to generate IR:
$ out/shecc --dump-ir -o fib tests/fib.c
Line-by-line explanation between C source and IR:
 C Source            IR                         Explanation
-------------------+--------------------------+----------------------------------------------------
int fib(int n)      fib:                        Reserve stack frame for function fib
{                     {
    if (n == 0)         x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 0                 Set x1 to zero
                        x0 == x1 ?              Compare x0 with x1
                        if false then goto 1641 If x0 != x1, then jump to label 1641
        return 0;       x0 := 0                 Set x0 to zero. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1641:
    else if (n == 1)    x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 == x1 ?              Compare x0 with x1
                        if true then goto 1649  If x0 != x1, then jump to label 1649
        return 1;       x0 := 1                 Set x0 to 1. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1649:
    return              x0 = &n                 Get address of variable n
       fib(n - 1)       x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 1)
       +                x0 := fib() @ 1631      Call function fib() into x0
                        push x0                 Store the result on stack
       fib(n - 2);      x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 2                 Set x1 to 2
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 2)
                        x1 := fib() @ 1631      Call function fib() into x1
                        pop x0                  Retrieve the result off stack into x0
                        x0 += x1                Add x1 to x0 i.e. the result of fib(n-1) + fib(n-2)
                        return (from fib)       Jump to function exit
                      }                         Restore the previous stack frame
                      exit fib
Known Issues
- The generated ELF lacks of .bss and .rodata section
- The unary *operator is not supported, which makes it necessary to use[0]syntax. Considerint x = 5; int *ptr = &x;and it is forbidden to use*ptr. However, it is valid to useptr[0], which behaves the same of*ptr.
- The support of varying number of function arguments is incomplete. No <stdarg.h>can be used. Alternatively, check the implementationprintfin sourcelib/c.cforvar_arg.
- The C front-end is a bit dirty because there is no effective AST.
License
shecc is freely redistributable under the BSD 2 clause license.
Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.