ldc icon indicating copy to clipboard operation
ldc copied to clipboard

Non matching mangled name for C functions using structs

Open jacob-carlborg opened this issue 7 years ago • 32 comments

Example:

module main;

struct Foo
{
    int a;
}

extern (C) void bar(Foo);
module foo;

struct Foo
{
    int a;
}

extern (C) void bar(Foo);
$ ldc2 main.d foo.d
main.d(8): Error: Function type does not match previously declared function with the same mangled name: bar

C functions doesn't have a mangling. I suspect the mangled name that would be used for a D function for the Foo struct is used.

jacob-carlborg avatar Jul 18 '18 18:07 jacob-carlborg

This is a duplicate of long-standing https://github.com/ldc-developers/ldc/issues/1020.

A quick look at the IR shows the problem - 2 equivalent Foo structs in different D modules:

; ldc2 -output-ll main.d
%main.Foo = type { i32 }
declare void @bar(%main.Foo) #0
; ldc2 -output-ll foo.d
%foo.Foo = type { i32 }
declare void @bar(%foo.Foo) #0

This is ~~only~~ mostly a problem with (mostly default/enforced) -singleobj though (i.e., -c should work), where code from all D modules is emitted into a single IR module (=> object file), and we can only have one IR function for a particular (mangled) name. extern(D) functions always feature the module prefix in their mangle, so this only applies to other linkages.

We make sure the IR types match if there are multiple declarations (and only emit the first definition in that case). If we didn't, we'd have to adapt the call sites, e.g., repainting IR foo.Foo call args as main.Foo ones.

kinke avatar Jul 18 '18 19:07 kinke

Wouldn't the proper way to do this be to not look at the actual name of the struct but instead its members? Basically building up a mangled/IR name combining the mangled names of the members, recursively.

jacob-carlborg avatar Jul 18 '18 20:07 jacob-carlborg

That's definitely something I haven't thought of before, but only imagining the length of some of these names (need to take alignments into account as well, unions...) makes my head ache. I look at IR a lot, and having type names >1k isn't bearable IMO.

kinke avatar Jul 18 '18 20:07 kinke

Yeah, the might be a problem. But as I mentioned in the other issue https://github.com/ldc-developers/ldc/issues/1020 it's easy to detect if a class/struct is extern(D) or some other linkage.

jacob-carlborg avatar Jul 18 '18 20:07 jacob-carlborg

it's easy to detect if a class/struct is extern(D) or some other linkage.

That has just been added with 2.081 AFAIK. But as your example shows, people don't expect having to declare a struct extern(C[++]) just because it's used in some C[++] function.

We could keep a map of struct identifier (excl. module prefix) to IR types, per IR module. When trying to emit a new IR type, we could check if its IR layout is equivalent to an existing type with the same identifier, and use that existing one instead.

kinke avatar Jul 18 '18 20:07 kinke

That has just been added with 2.081 AFAIK.

It's been present in ClassDeclaration for a long time, it was recently moved up to AggregateDeclaration to support structs as well. It has also been changed from several bools to one enum. It has existed as long as D has supported C++ classes. But there is no way to identify extern(C) struct because that doesn't have any other semantic differences compared to extern(D) struct.

But as your example shows, people don't expect having to declare a struct extern(C[++]) just because it's used in some C[++] function.

True.

We could keep a map of struct identifier (excl. module prefix) to IR types, per IR module. When trying to emit a new IR type, we could check if its IR layout is equivalent to an existing type with the same identifier, and use that existing one instead.

As long it works I'm happy.

jacob-carlborg avatar Jul 18 '18 20:07 jacob-carlborg

But as your example shows, people don't expect having to declare a struct extern(C[++]) just because it's used in some C[++] function.

This only applies to C functions, right? For C++ functions, the type is part of the mangling: in the OP example when extern(C) is replaced with extern(C++) the two modules would declare two different extern(C++) bar functions, because the two Foos are different types. I am wary of collapsing types that are different into the same thing; I don't know whether that may interfere badly with LTO+some_future_metadata_for_types+optimization. Could we emit a struct literal type (instead of named type) for extern(C) functions? I.e. something like this:

declare void @bar({i32}) #0

JohanEngelen avatar Jul 18 '18 22:07 JohanEngelen

C++ functions are affected too, the struct doesn't even need to be extern(C++) then, as the C++ mangle (_Z3bar3Foo in both cases) doesn't contain the D module (just extern(C++) namespaces, which can be defined across multiple D modules too).

kinke avatar Jul 18 '18 23:07 kinke

Could we emit a struct literal type (instead of named type) for extern(C) functions?

Doesn't have the same problem as I suggested https://github.com/ldc-developers/ldc/issues/2782#issuecomment-406056677?

jacob-carlborg avatar Jul 19 '18 06:07 jacob-carlborg

C++ functions are affected too, the struct doesn't even need to be extern(C++) then, as the C++ mangle (_Z3bar3Foo in both cases) doesn't contain the D module (just extern(C++) namespaces, which can be defined across multiple D modules too).

Wow indeed. That's a bad problem that will lead to miscompiles when templates are involved: consider two template instantiations in different compilation units but with different types, they will mangled the same although the types are different and just happen to have the same name, and linking will merge the symbols because that's needed for templates. This sounds like a bug to me: the mangling of C++ functions with D types should use the D type mangling.

Simple example that shows the bug:

static import core.thread;

struct Thread {
    int a;
}

extern (C++) void bar(Thread*) {}

extern (C++) void bar(core.thread.Thread) {}

JohanEngelen avatar Jul 19 '18 15:07 JohanEngelen

I don't necessarily see that as bug, just one more thing to consider for C++ interop. 'Fixing' it would break a ton of code I guess. Our error for mismatching IR signatures may help in preventing such bugs (mainly with -singleobj).

Anyway, wrt. to the original issue, I'm not sure a special handling (collapsing types etc.) pays off; after all, this is almost always triggered by duplicate equivalent struct definitions and function declarations, and code duplication is smelly. Extracting the duplicate C[++] types/functions into a dedicated D module solves the problem.

kinke avatar Jul 19 '18 15:07 kinke

If we want to allow this (signatures incompatible on the D level), I'd probably go for implementing it using repainting (replaceAllUsesWith and casting the function pointer), as trying to regularize the type on the LLVM level and getting that right in all cases seems like a lot of complexity.

We could then have a D-type-system implementation of structural type comparison/whatever heuristics we want to come up with to decide whether to present the user with an error message or not.

dnadlinger avatar Jul 19 '18 18:07 dnadlinger

[...] and code duplication is smelly.

Agreed, but what if two separate D libraries used in a large project wrap the same C/C++ code?

dnadlinger avatar Jul 19 '18 18:07 dnadlinger

If compiled in separate invokations (and no LTO), there's no problem. - But yeah, there'll always be some legitimate corner cases (template instantiations...).

kinke avatar Jul 19 '18 18:07 kinke

https://issues.dlang.org/show_bug.cgi?id=19101

JohanEngelen avatar Jul 19 '18 19:07 JohanEngelen

replaceAllUsesWith and casting the function pointer

Yeah, repainting the function pointer instead of the args should be a lot simpler indeed.

kinke avatar Jul 21 '18 09:07 kinke

I ran in this today as well:

/usr/include/dlang/ldc/core/stdc/stdio.d(1278,8): Error: Function type does not match previously declared function with the same mangled name: fwrite
/usr/include/dlang/ldc/core/stdc/stdio.d(1278,8):        Previous IR type: i64 (i8*, i64, i64, %libxlsxd.xlsxwrap._IO_FILE*)
/usr/include/dlang/ldc/core/stdc/stdio.d(1278,8):        New IR type:      i64 (i8*, i64, i64, %core.stdc.stdio._IO_FILE*)

this breaks xlsxd builds. this is with ldc 1.15.0

the structs are different though.

burner avatar May 23 '19 10:05 burner

the structs are different though

So xlsxd is duplicating the _IO_FILE struct definition and fwrite() prototype from core.stdc.stdio? Why? ;) - If it was using the druntime struct, the problem would go away.

kinke avatar May 23 '19 11:05 kinke

because I use dpp to include the headers of a c library called libxlsxwriter which in turn pulls in stdio.h which pulls in _IO_FILE.

another problem I see is that the _IO_FILE in druntine is out-of-date with the one in glibc. It was last updated in 2008. I have a way to hack around that, but its another manual step I would like to avoid.

burner avatar May 23 '19 12:05 burner

Ah, yeah dpp doesn't work (at all?) with LDC due to its architecture and this issue - it causes lots of duplicate aggregate definitions (and so init symbol + TypeInfos bloat etc.), Atila knows about it.

kinke avatar May 23 '19 12:05 kinke

luckily for me, for xlsxd its only _IO_FILE. Commenting that out makes it compile and pass the tests.

Has then been any idea/progress been made on this issue in general?

burner avatar May 23 '19 12:05 burner

No progress, but there are some ideas, mainly in this issue here (see https://github.com/ldc-developers/ldc/issues/3021#issuecomment-471551875 for links to the other duplicate issues). It's not trivial, and the most promising idea (casting the function pointer after verifying that the signatures are basically equivalent, i.e., that the used aggregates are 'equivalent', although distinct separate types) wouldn't work in your case, as your _IO_FILE diverges from druntime's.

kinke avatar May 23 '19 13:05 kinke

Has then been any idea/progress been made on this issue in general?

Yes, use DStep 😉

jacob-carlborg avatar May 23 '19 13:05 jacob-carlborg

Why not have a sort of @TolerantMangling flag in the meantime so we can get around library conflicts

etcimon avatar Feb 10 '20 21:02 etcimon

Sure, we can implement that as an attribute. I wonder whether that's worth it, though, or whether we should just allow it outright for simplicity (possibly with a warning hidden behind a command-line flag).

dnadlinger avatar Feb 10 '20 22:02 dnadlinger

As I've mentioned in https://github.com/ldc-developers/ldc/issues/3318#issuecomment-584382009 it would be important to be able to override wrong declarations as well for legacy purposes.

etcimon avatar Feb 10 '20 22:02 etcimon

+1 for the optional warning without extra UDA. Casting the function pointer should implicitly allow slightly wrong declarations, such as int/long for x64 targets if passed in a register.

kinke avatar Feb 10 '20 22:02 kinke

I would rather not have warnings that can't be removed if my code is superior to the one it's overriding

etcimon avatar Feb 10 '20 22:02 etcimon

Those warnings would be almost exclusively caused by duplicate struct declarations in multiple analyzed modules, so any 'superiority' would come with TypeInfos bloat (at best, just more stripping work for the linker). Settling for exactly one central D binding, either in druntime or some dub package, for each C(++) library would prevent this issue (and enforce DRY), but that's probably utopic.

kinke avatar Feb 10 '20 22:02 kinke

Exactly, but the issue with type bloat is doomed to stay because of workarounds on top of workarounds due to compatibility reasons. If we had the perfect solution from the start it would've helped

etcimon avatar Feb 10 '20 23:02 etcimon