Cesium icon indicating copy to clipboard operation
Cesium copied to clipboard

Translation unit name collision

Open ForNeVeR opened this issue 2 years ago • 4 comments

After #341, we now have a potential for name collision between different translation units. Consider two files called a/x.c and b/x.c: if both use statics, then two container types named x<Statics> will be generated.

We should think about resolving this somehow.

ForNeVeR avatar Oct 23 '22 18:10 ForNeVeR

I propose to implement the following naming schema for internal types (i.e. the <Statics>).

  1. Introduce the concept of compilation context directory. All the paths will be resolved relative to said directory.

    The compilation context by default is determined from the current directory, but should be possible to override via the compiler arguments.

  2. When compiling a translation unit, emit its <Statics> into a name corresponding to the file path relative to the compilation context directory. When processing the file paths, perform the following substitutions:

    • dot becomes a _: for example, a.c gets compiled as a_c<Statics>,
    • a directory separator gets mapped to a namespace separator: for example, foo/a.c becomes foo.a_c<Statics>,
    • real characters \, <, >, and _ have to be escaped with \: for example, a file name f_o_o/a<duh>.c (which is very odd even for a Linux system, I should say) becomes f\_o\_o.a\<duh\>_c<Statics>,
    • any other characters get translated as-is, even if they are odd. For example, a file name a\b\c| (which is possible to achieve on a Unix system) would generate a type a\\b\\c|<Statics>.
  3. Odd file system configuration cases:

    • if we have to traverse the parent directory to get to the file from a compilation context, replace that with a special mark <_>. For example, a path ../../foo.c would become <_>.<_>.foo_c<Statics>,
    • if the relative path to a file cannot be produced by the file system (i.e. a multi-root file system, such as the Windows one with files being placed on another drive), then just process the full file path according to the rules from the step 2. For example, a path C:\Windows\System32\main.c would emit a file C:.Windows.System32.main_c<Statics>.
  4. Some normal-ish examples:

    • foo.cfoo_c<Statics>
    • 3rdparty/lib/foobar.c3rdparty.lib.foobar_c<Statics>
    • ../3rdparty/foo.c<_>.3rdparty.foo_c<Statics>

This may or may not serve as the basis for #372, we'll see.

@Fantoom, @kant2002, what do you think?

ForNeVeR avatar May 10 '23 21:05 ForNeVeR

I have nothing against, except I dislive that we pollute paths in the class names. I thinkinig about having add prefix to generate unique names for paths outside current dir, but that's has their own issues like you have to guess path to translation unit. Maybe we can use attributes in that case.

I was writing this, what if we have plain list of translaiton names which uses just filename + prefix and thats it. We don't really need anything except unique class names. Paths maybe needed right now because we don't emit PDB. And even if it's clashes number of clashes should be relatively low. If we add PDB issue with complex naming would be solved, since we will see then in debugger.

Right now how I understand we will give to Cesium list of all C files and will produce single executable. So Cesium definitely has ability to make sure that all base names for translation units are unique.

kant2002 avatar May 11 '23 04:05 kant2002

@kant2002, do you suggest us to add an option for the user to provide the class names for every translation unit? We could add that as an option, but we'll still have to generate them somehow by default.

Look, one of my concerns is naming stability. Let's imagine that we started autogenerating the names and only "pollute" them if we detect a naming conflict. For example, we'll not pollute these two paths:

  • src/foo.cfoo<Statics>
  • src/bar.cbar<Statics>

But will automatically figure out that we need to resolve a conflict in case of

  • foo/a.cfoo.a<Statics>
  • bar/a.cbar.a<Statics>

So far so good, right?

But then, here's my concern: the name stability.

If implement this kind of smart automation, then that will mean that adding a new translation unit will affect the names of already existing ones, and that's a huge problem for any kind of linkage.

For <Statics>, it won't be a problem, since the names aren't public anyway, so we can do anything: renumber them internally, use a GUID or whatever.

But if we use the same strategy for any kind of public names, then we'll be in trouble: a version of the library compiled with different sources (or on a different platform where the library source files are placed differently) will become binary incompatible with the previous version. That's the thing I'm trying to avoid.

ForNeVeR avatar May 12 '23 21:05 ForNeVeR

the name stability is good point. I did not consider it. But let's look at this in following way. Public symbols in .NET which you would see is corresponding to the regular linker entries in C. It is not possible to have collisions in linker, so if you take existing app, and we make public just same things which are exposed to linker in C, then we have stable public interface.

I have to think about stable internal namings which is also valuable. IMO it is nice to have and must have, but you may tell me what I'm missing.

kant2002 avatar May 14 '23 14:05 kant2002