8cc icon indicating copy to clipboard operation
8cc copied to clipboard

Static initializers with complex address constants.

Open andrewchambers opened this issue 11 years ago • 12 comments

An issue in regards to constant initializers like:

int x;
int y;
int * a = &y + 1;
int * b =  1 - &y + 1;
int * c = &x - &x + 1;

The C standard states that initializers can be a label + or - a constant. All these expressions can be resolved to such a constant.

andrewchambers avatar May 08 '14 01:05 andrewchambers

relevant documentation: C11 6.6 Constant expressions:

"10 An implementation may accept other forms of constant expressions" - ugh, I guess we just need to accept whatever gcc accepts.

andrewchambers avatar Nov 30 '14 23:11 andrewchambers

This is the most significant missing feature as far as I know. We need to calculate an address in the compiler and propagate that information to the codegen. The current data structure was not designed with this feature in mind. I'm planning to do that after the codegen rewrite because I'll redesign the data structure for that.

rui314 avatar Nov 30 '14 23:11 rui314

It is a really tough problem, Here are some crazy examples:

// Works fine on gcc.
int *x = &(int[]){1, 2 ,3}[1];
// Fails on gcc, not constant, even though it is entirely known at compile time.
int x = (int[]){1, 2, 3}[1];

// Works fine on gcc, error on clang, standard says it should not work.
static char []x = *&"foo";

I'm currently revisiting this, because I am implementing it for my own compiler. Basically I am implementing an AST interpreter which operates on values with various types, with concrete or abstract values (which includes nesting in structs and arrays). It is still tricky to match other compilers exactly. Even gcc and clang support different things. This is probably one of the most frustrating and underspecified parts of the C standard for me so far :(

It gets more annoying when mixing constant and non constant expressions in local variable inits.

I would love to see how you solve this, If I solve it to a level I am happy with in my own compiler, I may try and port the solution to 8cc.

andrewchambers avatar Mar 02 '15 01:03 andrewchambers

commit 3e0527427850651b1dc385883de519812946622d adds some progress to this.

andrewchambers avatar Apr 29 '15 04:04 andrewchambers

@rui314 - Thought about this any more?

I am thinking it needs two passes. One to check types, array dimensions, and selector fields while parsing, another pass to split inits into two parts - Static and dynamic inits. Globals would only be allowed to have a static component.

The static data would be represented via datastructures something like:

StaticPtrDerived(Sz, PtrLabel, Offset)
StaticConstant(Sz, Value)
StaticString(Value)
StaticArray(CType, Dim, map[Offset] SubValues) // Uninitialized Holes are inferred
StaticStruct(CType, map[Offset] SubValues)

Each would also have an optional assembly Label. There would also need to be a way for anonymous fields to register themselves with a label and with the function.

andrewchambers avatar May 28 '15 22:05 andrewchambers

Not sure if flattening the Array and Struct static data to be a list of primitive types is worth it as another part.

andrewchambers avatar May 28 '15 22:05 andrewchambers

I'm currently pretty busy creating a new linker from scratch (https://github.com/llvm-mirror/lld/tree/master/COFF), and I also have other things to do. I'll revisit this later.

rui314 avatar Jun 02 '15 21:06 rui314

No problem, I'm working on my own solution too, I generally learned a lot from studying your stuff, which Is why I follow this closely still.

About linker: Is linker speed a serious issue for builds? I personally think the C/C++ world has become ridiculous due to bloat and headers. Building linux, llvm and gcc from scratch takes at least 10-20 minutes each on my quad core + ssd + 16 gigs of ram, I read that the plan9 kernel can build itself from scratch on a raspberry pi 2 in 1 minute.

andrewchambers avatar Jun 02 '15 22:06 andrewchambers

It really does when you create tens or hundreds of megabytes of executables. It is just too slow. Another problem is that no existing linkers can use multi-cores well. Parallelizing compilation is easy because it can be done just by running many compiler processes simultaneously. But you cannot parallelize the final link, and if the final link uses only one cpu, you are going to waste your time.

rui314 avatar Jun 02 '15 23:06 rui314

Hmm, yeah, I imagine a 20 or 30 second link after one line of code change is annoying, especially if you have more cores. Sounds pretty cool, With your new knowledge of linkers, linking directly like you attempted before in memory is probably easier haha.

andrewchambers avatar Jun 03 '15 06:06 andrewchambers

It is also nice to know that I can write good code even in C++ :) I wrote a fair amount of C++11 code for that, and I kind of like the language.

rui314 avatar Jun 03 '15 21:06 rui314

Haha, the only time I've enjoyed writing C++ was before I knew python and when I was using Visual Studio. Do you use an IDE for C++?

andrewchambers avatar Jun 03 '15 23:06 andrewchambers