zig icon indicating copy to clipboard operation
zig copied to clipboard

cImport / translate-c picks the wrong C type when parsing integer literals

Open cryptocode opened this issue 2 months ago • 2 comments

Zig Version

0.12.0-dev.3686+3adfaf91f

Steps to Reproduce and Observed Behavior

When importing this C header:

const loader = @cImport(@cInclude("mach-o/loader.h"));

using loader.LC_LOAD_WEAK_DYLIB leads to this compile error:

cimport.zig:5499:51: error: type 'c_int' cannot represent integer value '2147483648'
pub const LC_LOAD_WEAK_DYLIB = @as(c_int, 0x18) | LC_REQ_DYLD;

where LC_REQ_DYLD is originally #define LC_REQ_DYLD 0x80000000 which doesn't fit in a 32-bit signed integer.

The manual translation of this constant in macho.zig in the Zig std library is just a comptime int, and that + the other constants work fine. However, the macho.zig file is incomplete so I need to use the header directly, which exposes this issue.

translate-c issue

translate_c#parseCNumLit just picks c_int for unsuffixed integer literals, which leads to an error if the literal doesn't fit. It seems like the handling of suffixed literals also don't pick a larger type if necessary.

I believe this code should follow C std 6.4.4.1p5, which specifies the following:

The type of an integer constant is the first of the corresponding list in which its value can be represented.

Suffix Decimal Constant Octal, Hexadecimal or Binary (since C23) Constant
none int
long int
long long int
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
u or U unsigned int
unsigned long int
unsigned long long int
unsigned int
unsigned long int
unsigned long long int
l or L long int
long long int
long int
unsigned long int
long long int
unsigned long long int
Both u or U
and l or L
unsigned long int
unsigned long long int
unsigned long int
unsigned long long int
ll or LL long long int long long int
unsigned long long int
Both u or U
and ll or LL
unsigned long long int unsigned long long int

Note: If an integer constant cannot be represented by any type in its list, it may have an extended integer type, if the extended integer type can represent its value. If all of the types in the list for the constant are signed, the extended integer type shall be signed. If all of the types in the list for the constant are unsigned, the extended integer type shall be unsigned. If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.

I implemented this in cParseCNumLit, and LC_REQ_DYLD now becomes a c_uint instead of c_int. However, this isn't enough since the constants that derive their value from LC_REQ_DYLD are still c_int - thus you get the same compiler error. The type conversion must somehow propagate.

Related c_translation.zig issue and problematic fix?

See #11409 for a similar problem. This refers to a comment where yet another similar problem received a fix in #12695

This fix adds c_ulonglong to the list of signed candidate types. This seems to be at odds with C std 6.4.4.1p5 which, for instance, requires the extension type to be signed for unsuffixed signed decimal constants, per the table note above. Could i128 be the implementation-defined extension type in this case? This is also the type used when parsing literals.

Expected Behavior

Correct type usage per the C standard and no Zig compiler error on importing valid C headers

cryptocode avatar Apr 19 '24 17:04 cryptocode

This fix adds c_ulonglong to the list of signed candidate types. This seems to be at odds with C std 6.4.4.1p5 which, for instance, requires the extension type to be signed for unsuffixed signed decimal constants, per the table note above.

clang and gcc differ here - GCC follows the standard and promotes to __int128; clang converts it to unsigned long long. Since translate-c is based on clang we follow the clang behavior: https://godbolt.org/z/W75c6WGjY

Currently aro follows the clang behavior but we should update it to be able to do either depending on what compiler is being emulated.

ehaas avatar Apr 19 '24 19:04 ehaas

clang and gcc differ here - GCC follows the standard and promotes to __int128; clang converts it to unsigned long long. Since translate-c is based on clang we follow the clang behavior: https://godbolt.org/z/W75c6WGjY

Currently aro follows the clang behavior but we should update it to be able to do either depending on what compiler is being emulated.

Thanks for the insights. IMO, Zig should definitely not perpetuate behavior like converting unsuffixed decimal constants to ULL. Moreover, it seems that translate-c doesn't promote to a larger type at all in parseCNumLit (though I'm not that familiar with the code base so could be missing something), while c_translate goes the non-compliant clang route in promoteIntLiteral.

cryptocode avatar Apr 19 '24 20:04 cryptocode