Cesium icon indicating copy to clipboard operation
Cesium copied to clipboard

Function and type namespaces for outside usage

Open ForNeVeR opened this issue 1 year ago • 2 comments

Most of the current major C implementations only deal with external function visibility and not namespaces. In this issue, by external visibility I mean visibility to the outside world, i.e. the possibility to call a function from an external program (by linking a C object or a DLL).

There are two approaches:

  1. Make everything "public" (i.e. callable from the outside) by default. This strategy is used by GCC and other Linux-originating compilers.
  2. Make everything "private" by default, introduce a special way (i.e. an __attribute__ or __declspec(dllexport)) to designate the public symbols.

In both cases, it's possible to change the default behavior via a toolset setting, and I think it's also possible to explicitly set the list of public symbols via some additional files that get processed at the linking stage.

For ease of external use, a header file may be provided by a C program that contains a set of definitions in which the functions are marked by a dual attribute (i.e. __declspec(dllimport)).

Cesium is different from classic C implementations in several aspects.

  1. We have no separate linking stage, as there is no analog of object files in .NET.
  2. Among the functions, we also have to export types from the resulting assembly for it to be useful to other .NET languages.
  3. We provide no header files for the external programs to use Cesium programs, but rely on public .NET metadata instead.

I consider usability of Cesium-compiled programs by external .NET code as an issue of critical importance, and I want as much as possible to work flawlessly by default for major use-cases of Cesium.

But that means we'll have to deal with much more problems than C normally do.

Problems

Different type definitions across translation units

The main problem is that a C program may be composed of several translation units, each of which may provide different definitions for symbols with the same name. And while this shouldn't be an issue for public functions (since such programs would be considered broken at the linking stage), this may be an issue for type definitions. Consider this:

// a.c
struct foo { int x; };
void perform1(struct foo*) { }
// b.c
struct foo { double x; };
void perform1(struct double*) { }

It is possible to compile a valid C program from these definitions, but it is pretty hard to express this program in .NET metadata.

C namespaces

C17 doesn't formally provide any "namespaces", but the type symbols in the language are generally divided into two categories: typedef'd names and type names. These two categories form two distinct "namespaces" (not in terms of C but in terms of programming language design): in the scope of a translation unit, you can only have one entity per name per namespace.

E.g. this is valid:

struct s { int x; };
typedef struct { int y; } s;

struct s a; // has member x
s a; // has member y

While this is not:

union s { int x; };
struct s { int x; }; // error: name conflict, even though there are `union s` and `struct s`

Currently, after merge of #360, typedef'd names are compiled into types named like <typedef>s which breaks the interop with C# (it's impossible to call this type in C# syntax), which is a problem itself.

Solutions

What can we do on each of these problems?

Internal name conflicts

What to do with struct s vs typedef struct { … } s?

I propose to put them info different .NET namespaces.

  • by default, compile it into the following (C#):
    struct s {}
    namespace TypeDef {
      struct s {}
    }
    
  • provide an option to specify separate namespaces for "default" types and "typedef'd" ones.

This way, both names are accessible to C# side, and separated from each other. And if a certain program prefers TypeDef all the way, then it's still possible to access then in C# easily by using TypeDef.

I have considered switching the default to the other way around:

struct s {} // compiled from typedef struct s { … };
namespace Struct {
  struct s {} // compiled from struct s { … };
}

This has a nice logical advantage of that the concise way of using just s is just s in both languages, while the "long way" of using struct s in C is a similarly sounding Struct.S in C#.

But then, we'd need a separate namespace for Union, or a better term instead of Struct for the default.

Maybe we should go this way, adding both Struct and Union? Thoughts?

Conflicts across translation units

What can we do with this?

// a.c
struct foo { int x; };
void perform1(struct foo*) { }
// b.c
struct foo { double x; };
void perform1(struct double*) { }

I consider this as an uncommon case, so I suggest us to only enable a special countermeasure with a manually-defined compiler flag, say, --separate-translation-unit-metadata (ideas are welcome for a better name).

That flag would then switch to per-translation unit type generation (i.e. a.foo vs b.foo would become different types in the metadata).

This is pretty problematic though in cases when a and b translation units use each other via a shared header file. They could really wreak havoc if they do that.

For those cases, we can issue a compiler warning on struct type incompatability detection, and use a pointer cast operator (on pointers), or a value type cast operator (take pointer, cast pointer, dereference pointer) on cases when structs are passed by value. I think it's UB to use incompatible types in this manner anyway, so in a standard-compliant library this should be rare.

I have considered using some kind of auto-detection of this case (i.e. automatically hide types that are incompatible into separate per-translation-unit-namespaces, and emit it into a global namespace otherwise), but I don't think we want that. It would hurt ABI stability: if I only change code in a.c, then the public metadata of my assembly regarding b.c would change.

So, by default, we could just check that all the definitions of the same types across a.c and b.c are compatible. And only start doing mumbo-jumbo iff an explicit flag is passed.

Public vs private definitions

In this case, I think let's follow the Microsoft behavior by default (make everything private), and only declare functions as public if specifically requested. We should support the other way via a compiler flag, though. As a visibility marker, we can support both GCC and Microsoft's way of doing that (i.e. both __declspec() and __attribute()).

Regarding the types, as no other precedent exists, I suggest we provide the following strategies:

  • (default) only mark type as public if it is used in a public function
  • (optional) mark all the types as public (for type-only libraries, i.e. consider someone wants to publish a bunch of type definitions in a header file)

I also consider a possibility to also allow marking the types via declspec(dllexport), but AFAIK there's currently no precedent of other C compilers doing this, and thus there are no C programs using this technique. So, let's not do that for now, but wait for the users' feedback on that particular point.

ForNeVeR avatar Apr 30 '23 15:04 ForNeVeR

@kant2002, please take a look at this… essay. It is related to how we currently handle #360.

Do you see any other cases requiring special handling? What do you think in general of all this?

After a discussion, I want to split this into several issues about each particular feature or change.

Nothing of this is too critical and requires immediate attention, except maybe the public type naming issue.

ForNeVeR avatar Apr 30 '23 15:04 ForNeVeR

@Fantoom, this also somewhat touches the topic of #345, and I remember you were interested in that one some time ago.

ForNeVeR avatar Apr 30 '23 15:04 ForNeVeR