dcc
dcc copied to clipboard
Direct/Interactive C Compiler
dcc - Direct C Compiler
A C99 compliant C compiler with additions implementing many extensions and features, as well as arbirary-precision integer arithmetic.
The main feature that differentiates this compiler from others, is its ability to directly read, preprocess, tokenize, parse, assemble and link c source code, all at the same time, in a way allowing you to execute C code in an environment similar to that of an interactive commandline. If you are interested in how this is achieved, take a look at /include/drt/drt.h
Currently only able to target I386 and above, support for x86-64 is planned and already partially implemented.
Supported output formats are ELF, windows PE, as well as direct execution of generated code.
DCC supports AT&T inline assembly syntax, emulating gcc's __asm__ statement and the GNU assembler as well as direct parsing of assembly sources.
Using TPP as preprocessor to implement a fully featured perprocessor, DCC implements many GCC extensions such as __asm__, __builtin_constant_p, many __attribute__-s, __typeof__, __auto_type, and many more, including my own twist on awesome C extensions.
Development on DCC started on 17.04.2017, eversince then being the usual one-person project.
Current state:
Note that DCC is still fairly early in its development, meaning that anything can still change and that more features will be added eventually.
- Link against windows PE binaries/libraries (*.dll).
- Statically link against PE binaries (as in: clone everything from a *.dll)
- Dynamically/Statically link against ELF binaries/libraries/object files (*, *.so, *.o)
- Output windows PE binary/library (*.exe, *.dll).
- Output linux ELF binary/library (*, *.so).
- Output ELF relocatable object files (*.o)
- Process and merge (link) multiple source-/object files/static libraries.
- Compiling DCC is mainly tested and working on windows using Visual C or DCC itself. GCC and linux support is present, but may occasionally be broken.
- Full STD-C compliance up to C99.
- Full AT&T assembly support with many GNU assembler extensions (see below).
- Full ELF binary target support.
- Fully working live execution of C source code.
- DCC can fully compile itself (And the result can compile itself again!)
Planned features:
- Support for X86-64/AMD64 CPU architectures.
- Compiling DCC on linux (most of the work's already there, but nothing's tested yet).
- Compiling DCC with DCC (because every C compiler must be able to do that!).
- Generation of debug information (recognizeable by gdb).
- Finish many partially implemented features (see below).
- Support for true thread-local storage (aka. segment-based)
Features (Compiler):
- DCC as host compiler can easily be detected with
defined(__DCC_VERSION__). - Using TPP as preprocessor, every existing preprocessor extension is supported, as well as all that are exclusive to mine.
- Live-compilation-mode directly generates assembly.
- C-conforming symbol forward/backward declaration.
- K&R-C compatible
- Full STD-C89/90 compliance
- Full STD-C95 compliance
- Full STD-C99 compliance
- Supports all C standard types.
- Supports 64-bit
long longintegrals (using double-register storage). - Supports all C control statements.
- Supports C11
_Generic. - Supports C11
_Atomic(Not fully implemented). - Supports C99
_Bool. - Supports C99
__func__builtin identifier. - Supports Variable declaration in if-expressions and for-initializers.
- Supports nested function declaration, as well as access to variables from surrounding scopes.
- Supports C++ lvalue types (
int y = 10; int &x = y;). - Supports C structure bitfields
- Support for GCC statement-expressions:
int x = ({ int z = 10; z+20; }); // x == 30. - Support for
__FUNCTION__and__PRETTY_FUNCTION__, including use by concat with other strings:char *s = "Function " __FUNCTION__ " was called"; printf("%s\n",s);. - Support for GCC
__sync_*builtin functions (__sync_val_compare_and_swap(&x,10,20)). - Supports all compiler-slangs for alignof:
_Alignof,__alignof,__alignof__and__builtin_alignof. - Support for compile-time type deduction from expressions:
typeof,__typeof,__typeof__. - Support for GCC scoped labels:
__label__. - Support for GCC-style inline assembly:
__asm__("ret"). - Support for MSVC fixed-length integer types:
__int(8|16|32|64). - Support for GCC
__auto_type(as well as special interpretation ofautowhen not used as storage class. -auto int x = 42auto is storage class;auto y = 10;auto denotes automatic type deduction). - Support for C99 variable-length arrays:
int x = 10; int y[x*2]; assert(sizeof(y) == 80);. - Support for old (pre-STDC: K&R-C) function declarations/implementations.
- Support for new (post-STDC: C90+) function declarations/implementations.
- Support for floating-point types (Assembly generator is not implemented yet).
- Support for GCC x86 segment address space (
__seg_fs/__seg_gs) - Debugging aids for pre-initializing local variables with
0xCCbytes and memory allocated usingallocawith0xAC. - Inherited from assembly: Named register identifiers.
int x = %eax;(CPU-specific, on i386 compiles tomov %eax, x).int x = *(int *)%fs:0x18;(Can also be used to access segment register, on i386 compiles tomovl %fs:(0x18), x).
- Inherited from assembly: Get current text address.
void *p = .;(Evaluates to the current text address withvoid *typing).
- Use label names in expressions:
void *p = &&my_label; my_label: printf("p = %p\n",p);
- Support for new & old GCC structure/array initializer:
- dot-field:
struct { int x,y; } p = { .x = 10, .y = 20 }; - field-collon:
struct point { int x,y; } p = { x: 10, y: 20 }; - array-subscript:
int alpha[256] = { ['a' ... 'z'] = 1, ['A' ... 'Z'] = 1, ['_'] = 1 };
- dot-field:
- Support for runtime brace-initializers:
struct point p = { .x = get_x(), .y = get_y() }; - Split between struct/union/enum, declaration and label namespaces:
foo: struct foo foo; // Valid code and 3 different 'foo' - Support for unnamed struct/union inlining:
union foo { __int32 x; struct { __int16 a,b; }; };offsetof(union foo,x) == 0,offsetof(union foo,a) == 0,offsetof(union foo,b) == 2
- Support for builtin functions offering special compile-time optimizations, or functionality (Every builtin can be queried with
__has_builtin(...)):char const (&__builtin_typestr(type_or_expr t))[];- Accepting arguments just like 'sizeof', return a human-readable representation of the [expression's] type as a compile-time array of characters allocated in the '.string' section.
_Bool __builtin_constant_p(expr x);expr __builtin_choose_expr(constexpr _Bool c, expr tt, expr ff);_Bool __builtin_types_compatible_p(type t1, type t2);void __builtin_unreachable(void) __attribute__((noreturn));void __builtin_trap(void) __attribute__((noreturn));void __builtin_breakpoint(void);- Emit a CPU-specific instruction to break into a debugging environment, or do nothing if the target CPU doesn't allow for such an instruction
void *__builtin_alloca(size_t s);void *__builtin_alloca_with_align(size_t s, size_t a);void __builtin_assume(expr x),__assume(expr x);long __builtin_expect(long x, long e);const char (&__builtin_FILE(void))[];int __builtin_LINE(void);const char (&__builtin_FUNCTION(void))[];void *__builtin_assume_aligned(void *p, size_t align, ...);size_t __builtin_offsetof(typename T, members...);T (__builtin_bitfield(T expr, constexpr int const_index, constexpr int const_size)) : const_size;- Access a given sub-range of bits of any integral expression, the same way access is performed for structure bit-fields.
typedef ... __builtin_va_list;void __builtin_va_start(__builtin_va_list &ap, T &start);void __builtin_va_end(__builtin_va_list &ap);void __builtin_va_copy(__builtin_va_list &dstap, __builtin_va_list &srcap);T __builtin_va_arg(__builtin_va_list &ap, typename T);- Compiler-provided var-args helpers for generating smallest-possible code
int __builtin_setjmp(T &buf);void __builtin_longjmp(T &buf, int sig) __attribute__((noreturn));- Requires:
sizeof(T) == __SIZEOF_JMP_BUF__ - Compile-time best-result code generation for register save to 'buf'
- Optimizations for 'sig' known to never be '0'
- Requires:
void *__builtin_malloc(size_t s);void *__builtin_calloc(size_t c, size_t s);void *__builtin_realloc(void *p, size_t c, size_t s);void __builtin_free(void *p);void __builtin_cfree(void *p);void *__builtin_return_address(unsigned int level);void *__builtin_frame_address(unsigned int level);void *__builtin_extract_return_addr(void *p);void *__builtin_frob_return_address(void *p);void *__builtin_isxxx(void *p);- ctype-style builtin functions
void *__builtin_memchr(void *p, int c, size_t s);void *__builtin_memrchr(void *p, int c, size_t s);- Additional functions are available for
mem(r)len/mem(r)end/rawmem(r)chr/rawmem(r)len
- Additional functions are available for
T __builtin_min(T args...);T __builtin_max(T args...);void __builtin_cpu_init(void);int __builtin_cpu_is(char const *cpuname);int __builtin_cpu_supports(char const *feature);char (&__builtin_cpu_vendor(char *buf = __builtin_alloca(sizeof(__builtin_cpu_vendor()))))[?];char (&__builtin_cpu_brand(char *buf = __builtin_alloca(sizeof(__builtin_cpu_brand()))))[?];- Returns a target-specific
'\0'-terminated string describing the brand/vendor name of the host CPU. The length of the returned string is always constant and known at compile-time. __builtin_cpu_initis required to be called first, and if the string cannot be determined at runtime, the returned string is filled with all'\0'-characters.
- Returns a target-specific
uint16_t __builtin_bswap16(uint16_t x);uint32_t __builtin_bswap32(uint32_t x);uint64_t __builtin_bswap64(uint64_t x);int __builtin_ffs(int x);int __builtin_ffsl(long x);int __builtin_ffsll(long long x);int __builtin_clz(int x);int __builtin_clzl(long x);int __builtin_clzll(long long x);- Generate inline code with per-case optimizations for best results
T __builtin_bswapcc(T x, size_t s = sizeof(T));int __builtin_ffscc(T x, size_t s = sizeof(T));int __builtin_clzcc(T x, size_t s = sizeof(T));- General purpose functions that works for any size
void *__builtin_memcpy(void *dst, void const *src, size_t s);- Replace with inlined code for sizes known at compile-time
- Warn about dst/src known to overlap
void *__builtin_memmove(void *dst, void const *src, size_t s);- Optimize away dst == src cases
- Hint about dst/src never overlapping
void *__builtin_memset(void *dst, int byte, size_t s);- Replace with inlined code for sizes known at compile-time
int __builtin_memcmp(void const *a, void const *b, size_t s);- Replace with compile-time constant for constant
- Replace with inline code for sizes known at compile-time
size_t __builtin_strlen(char const *s);- Resolve length of static strings at compile-time
- Split between declaration and assembly name (aka.
__asm__("foo")suffix in declarations) - Arbitrary size arithmetic operations (The sky's the limit; as well as your binary size bloated with hundreds of add-instructions for one line of source code).
- Support for deemon's 'pack' keyword (now called
__pack):- Can be used to emit parenthesis almost everywhere (except in the preprocessor, or when calling macros)
- Explicit alignment of code, data, or entire sections in-source
- Support for
#pragma comment(lib,"foo")to link against a given library "foo" - Support for
#pragma pack(...) - Supports GCC builtin macros for fixed-length integral constants (
__(U)INT(8|16|32|64|MAX)_C(...)). - GCC-compatible predefined CPU macros, such as
__i386__or__LP64__. - Support for GCC builtin macros, such as
__SIZEOF_POINTER__,__SIZE_TYPE__, etc.
Features (Attributes):
- Ever attribute can be written in one of three ways:
- GCC attribte syntax (e.g.:
__attribute__((noreturn))) - cxx-11 attributes syntax (e.g.:
[[noreturn]]) - MSVC declspec syntax (e.g.:
__declspec(noreturn))
- GCC attribte syntax (e.g.:
- The name of an attribute (in the above examples
noreturn) can be written with any number of leading, or terminating underscores to prevent ambiguity with user-defined macros:__attribute__((____noreturn_))is the same as__attribute__((noreturn))
- The following attributes (as supported by other compiler) are recognized:
__attribute__((noreturn*))__attribute__((warn_unused_result*))__attribute__((weak*))__attribute__((dllexport*))__attribute__((dllimport*))__attribute__((visibility("default")))__attribute__((alias("my_alias")))__attribute__((weakref("my_alias")))__attribute__((used*))__attribute__((unused*))__attribute__((cdecl*))__attribute__((stdcall*))__attribute__((thiscall*))__attribute__((fastcall*))__attribute__((section(".text")))__attribute__((regparm(x)))__attribute__((naked*))__attribute__((deprecated))__attribute__((deprecated(msg)))__attribute__((aligned(x)))__attribute__((packed*))__attribute__((transparent_union*))__attribute__((mode(x)))(Underscores surroundingxare ignored)- All attribute names marked with '*' accept an optional suffix that adds an enabled-dependency on a compiler-time expression. (e.g.:
__attribute__((noreturn(sizeof(int) == 4)))- Mark as noreturn, ifintis4bytes wide)
- Attributes not currently implemented (But planned to be):
__attribute__((constructor))__attribute__((constructor(priority)))__attribute__((destructor))__attribute__((destructor(priority)))__attribute__((ms_struct))__attribute__((gcc_struct))
- Attributes ignored without warning:
__attribute__((noinline...))__attribute__((returns_twice...))__attribute__((force_align_arg_pointer...))__attribute__((cold...))__attribute__((hot...))__attribute__((pure...))__attribute__((nothrow...))__attribute__((noclone...))__attribute__((nonnull...))__attribute__((malloc...))__attribute__((leaf...))__attribute__((format_arg...))__attribute__((format...))__attribute__((externally_visible...))__attribute__((alloc_size...))__attribute__((always_inline...))__attribute__((gnu_inline...))__attribute__((artificial...))
- New attributes added by DCC:
__attribute__((lib("foo")))- Most effective for PE targets: 'foo' is the name of the DLL file that the associated declaration should be linked against.
- Using this attribute, one can link against DLL files that don't exist at compile-time, or create artificial dependencies on ELF targets.
__attribute__((arithmetic*))- Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
- Most operators are implemented through inline-code, but some (mul,div,mod,shl,shr,sar) generate calls to external symbols.
- When this attribute is present, the associated structure type can be modified with 'signed'/'unsigned' to control the sign-behavior.
- Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
- In addition, the following keywords can be used anywhere attributes are allowed.
{_}_cdecl: Same as__attribute__((cdecl)){_}_stdcall: Same as__attribute__((stdcall)){_}_fastcall: Same as__attribute__((fastcall))__thiscall: Same as__attribute__((thiscall))
Features (Warnings):
- DCC features an enourmous amount of warnings covering everything from code quality, to value truncation, to syntax errors, to unresolved references during linkage, etc...
- Any warning can be configured as
- Disabled: (Compilation is continued, but based on severity, generated assembly/binary may be wrong)
- Enabled: Emit a warning, but continue compilation as if it was disabled
- Error: Emit an error message and halt compilation at the next convenient location
- Supress: Works recursively: Handle the warning as Disabled for every time it is suppressed before reverting its state to before it was.
- Warnings are sorted into named groups that can be disabled as a whole. The main group of a warning is always displayed when it is emit. (e.g.:
W1401("-WSyntax"): Expected ']', but got ...) - The global warning state can be pushed/popped from usercode:
- Push:
#pragma warning(push)#pragma GCC diagnostic push
- Pop:
#pragma warning(pop)#pragma GCC diagnostic pop
- Push:
- Individual warnings/warning group states can be explicitly defined from usercode:
- Disabled:
#pragma warning("[-][W]no-<name>")#pragma warning(disable: <IDS>)#pragma warning(disable: "[-][W]<name>")#pragma GCC diagnostic ignored "[-][W]<name>"
- Enabled:
#pragma warning(enable: <IDS>)#pragma warning(enable: "[-][W]<name>")#pragma GCC diagnostic warning "[-][W]<name>"
- Error:
#pragma warning(error: <IDS>)#pragma warning(error: "[-][W]<name>")#pragma GCC diagnostic error "[-][W]<name>"
- Suppress (once for every time a warning/group is listed):
#pragma warning(suppress: <IDS>)#pragma warning(suppress: "[-][W]<name>")#pragma warning("[-][W]sup-<name>")#pragma warning("[-][W]suppress-<name>")
- Revert to default state:
#pragma warning(default: <IDS>)#pragma warning(default: "[-][W]<name>")#pragma warning("[-][W]def-<name>")
IDSis a space-separated list of individual warning IDS as integral constants- Besides belonging to any number of groups, each warning also has an ID
- Use of these
IDSshould be refrained from, as they might change randomly
- Similar to the
extension-pragma,#pragma warning(...)accepts a comma-seperated list of commands.#pragma warning(push,disable: "-Wsyntax")
- Disabled:
- All warnings can be enabled/disabled on-the-fly using pragmas:
#pragma warning(push|pop)Push/pop currently enabled extensions#pragma warning("-W<name>")Enable warning 'name'#pragma warning("-Wno-<name>")Disable warning 'name'
#pragma GCC system_headertreats the current input file as though all warnings disabled- Mainly meant for headers in /fixinclude which may re-define type declarations, but are not meant to cause any problems
Features (Extensions):
- Extensions are implemented in two different ways:
- Extensions that are always enabled, but emit a warning when used.
- The warning can either be disabled individually (e.g.:
#pragma warning("-Wno-declaration-in-if")). - Or all extension warnings can be disabled using
#pragma warning("-Wno-extensions"). - Don't let yourself be fooled. Writing
"-Wno-extensions"disables warnings about extensions, not extensions themself! - Some warnings are also emit for deprecated or newer language features.
"constant-case-expressions": Emit for old-style function declarations."old-function-decl": Emit for old-style function declarations.
- The warning can either be disabled individually (e.g.:
- Extensions that may change semantics and can therefor be disabled.
- All of these extensions can be enabled/disabled on-the-fly using pragmas:
- As comma-seperated list in
#pragma extension(...)push: Push currently enabled extensions (e.g.:#pragma extension(push))pop: Pop previously enabled extensions (e.g.:#pragma extension(pop))"[-][f]<name>": Enable extensionname(e.g.:#pragma extension("-fmacro-recursion"))"[-][f]no-<name>": Disable extensionname(e.g.:#pragma extension("-fno-macro-recursion"))
- As comma-seperated list in
"expression-statements": Recognize GCC statement-expressions."label-expressions": Allow use of labels in expression (prefixed by&&)."local-labels": Allow labels to be scoped (using GCC's__label__syntax)."gcc-attributes": Recognize GCC__attribute__((...))syntax."msvc-attributes": Recognize MSVC__declspec(...)syntax."cxx-11-attributes": Recognize c++11[[...]]syntax."attribute-conditions": Allow optional conditional expression to follow a switch-attribute."calling-convention-attributes": Recognize MSVC stand-alone calling convention attributes (e.g.:__cdecl)."fixed-length-integer-types": Recognize fixed-length integer types (__int(8|16|32|64))."asm-registers-in-expressions": Allow assembly registers to be used in expressions (e.g.:int x = %eax;)."asm-address-in-expressions": Allow assembly registers to be used in expressions (e.g.:int x = %eax;)."void-arithmetic":sizeof(void) == __has_extension("void-arithmetic") ? 1 : 0."struct-compatible": When enabled, same-layout structures are compatible, when disabled, only same-declaration structs are."auto-in-type-expressions": Allowautobe be used either as storage class, or as alias for__auto_type."variable-length-arrays": Allow declaration of C99 VLA variables."function-string-literals": Treat__FUNCTION__and__PRETTY_FUNCTION__as language-level string literals."if-else-optional-true": Recognize GCC if-else syntaxint x = (p ?: other_p)->x; // Same as '(p ? p : other_p)->x'."fixed-length-integrals": Recognize MSVC fixed-length integer suffix:__int32 x = 42i32;."macro-recursion": Enable/Disable TCC recursive macro declaration.- Many more extensions are provided by TPP to control preprocessor syntax, such as
#include_nextdirectives. Their list is too long to be documented here.
- All of these extensions can be enabled/disabled on-the-fly using pragmas:
- Extensions that are always enabled, but emit a warning when used.
Features (Optimization):
- Dead code elimination
- Correct deduction on merging branches, such as if-statement with two dead branches
- Re-enable control flow when encountering a label
- Correctly interpretation of
__builtin_unreachable() - Correctly interpretation of
__{builtin_}assume(0)
- Automatic constant propagation
- Even capable of handling generic offsetof:
(size_t)&((struct foo *)0)->bar
- Even capable of handling generic offsetof:
- Automatic removal of unused symbols/data
- Recursively delete unused functions/data symbols from generated binary
- Can be suppressed for any symbol using
__attribute__((used))
- Automatic merging of data in sections marked with
M(merge) (Not fully implemented, because of missing re-use counter; the rest already works)- Using the same string (or sub-string) more than once will only allocate a single data segment:
printf("foobar\n"); printf("bar\n");Re-use"bar\n\0"as a sub-string of"foobar\n\0"
- Using the same string (or sub-string) more than once will only allocate a single data segment:
Features (Assembler):
- Full AT&T Assembly support
- Extension for fixed-length
- Supported assembly directives are:
.align <N> [, <FILL>].skip <N> [, <FILL>].space <N> [, <FILL>].quad <I>.short <I>.byte <I>.word <I>.hword <I>.octa <I>.long <I>.int <I>.fill <REPEAT> [, <SIZE> [, <FILL>]]. = <ORG>.org <ORG>.extern <SYM>.global <SYM>.globl <SYM>.protected <SYM>.hidden <SYM>.internal <SYM>.weak <SYM>.local <SYM>.used <SYM>.unused <SYM>.size <SYM>, <\SIZE>.string <STR>.ascii <STR>.asciz <STR>.text.data.bss.section.previous.set <SYM>, <VAL>.include <NAME>.incbin <NAME> [, <SKIP> [, <MAX>]]
- CPU-specific, recognized directives:
- I386+
.code16.code32
- X86-64
.code64
- I386+
- Directives ignored without warning:
.file ....ident ....type ....lflags ....line ....ln ...
Features (Linker):
- Integrated linker allows for direct (and very fast) creation of executables
- Merge multiple source files into a single compilation unit
- ELF-style visibility control/attributes (
__attribute__((visibility(...)))) - Directly link against already-generated PE binaries
- Add new library dependencies from source code (
#pragma comment(lib,...)) - Output to PE binary (*.exe/*.dll)