WIP: Bit fields
I hoped this day would never come, but we need crappy bit fields for ImportC - and D 2.101 (FWIW, I was very much against that decision). They mess up a bunch of assumptions (every member aligned at a byte boundary and of an integral size in bytes, so directly addressable) and come with no LLVM support AFAICT, so... much fun.
WTF:
struct S {
unsigned char a:7;
_Bool b:1;
_Bool c:1;
unsigned d:23;
unsigned e:7;
unsigned f:2;
};
AST info (as of v2.100.1, FWIW on Linux):
a: byte offset 0, bit offset 0, type size 1
b: byte offset 0, bit offset 7, type size 1
c: byte offset 1, bit offset 0, type size 1
d: byte offset 1, bit offset 1, type size 4 (I expected: byte offset 0, bit offset 9)
e: byte offset 1, bit offset 24, type size 4 (I expected: byte offset 4, bit offset 0)
f: byte offset 5, bit offset 0, type size 4 (I expected: byte offset 4, bit offset 7)
This came up while thinking about how to map this to IR fields - I was thinking about a {i32, i32} in this case, and then merging the initializers of the involved fields.
Holy moly, clang uses a { i48 } for the testcase above.
For
struct S {
unsigned char a:7;
_Bool b:1;
_Bool c:1;
unsigned ui:23;
};
it's { i32 } (no surprise there). With ui:24, it's a { i16, i24 }. With ui:25, { i16, i32 }. Geez.
WTF:
struct S { unsigned char a:7; _Bool b:1; _Bool c:1; unsigned d:23; unsigned e:7; unsigned f:2; };AST info (as of v2.100.1, FWIW on Linux):
a: byte offset 0, bit offset 0, type size 1 b: byte offset 0, bit offset 7, type size 1 c: byte offset 1, bit offset 0, type size 1 d: byte offset 1, bit offset 1, type size 4 (I expected: byte offset 0, bit offset 9) e: byte offset 1, bit offset 24, type size 4 (I expected: byte offset 4, bit offset 0) f: byte offset 5, bit offset 0, type size 4 (I expected: byte offset 4, bit offset 7)
This feels almost random... :S
Holy moly, clang uses a
{ i48 }for the testcase above.
The frontend has to already have the C algorithm for calculating the sizeof, right? (for when a user queries and for putting it in arrays)
The frontend has to already have the C algorithm for calculating the sizeof, right? (for when a user queries and for putting it in arrays)
Yeah well I hope it gets the field (byte) offsets and overall aggregate sizes right. ;) - But the bit field info is quite limited AFAICT; there's just an extra bit offset and width in bits, that's it. The size in bytes of a bit field is computed in an extra manner in the DMD glue layer (and apparently depends on the target's C compiler): https://github.com/dlang/dmd/blob/114deec135f0bb62d5fc8c4788727c58b2e0ca42/compiler/src/dmd/todt.d#L983-L997
I've quickly looked at https://en.cppreference.com/w/cpp/language/bit_field - plain horrible, and all this implementation-defined crap would now land in D too. [There's even int a:0 and int a:1000 apparently, surely not correctly implemented…]
Maybe we can get away with not supporting initializers for bit fields initially, neither for init symbols nor for literals, and just filling them with [N x i8] zero-initialized 'padding' bytes. And requiring explicit assignments to these fields later, where we don't require a suited IR type for the initializer constant.
Okay, the bulk of the work is done, upstream tests are green.
Oh, scratch that; my assumption that consecutive bit fields can be grouped by their byte offset doesn't work - these groups can overlap (even without unions):
struct S {
unsigned char a:7; // byte offset 0, bit offset 0, bit width 7
_Bool b:1; // byte offset 0, bit offset 7, bit width 1
_Bool c:1; // byte offset 1, bit offset 0, bit width 1
unsigned d:22; // byte offset 1, bit offset 1, bit width 22
_Bool e:1; // byte offset 3, bit offset 7, bit width 1
};
This apparently isn't tested upstream (yields an ICE with this PR) and complicates matters significantly, again mostly for the IR-type mapping. Some options:
- Look into this in the frontend and try to come up with saner offsets for non-overlapping groups.
- Allow for 'arbitrary' IR fields (e.g.,
<{ i8, i16, i8 }>for the above), and break up constant initializers into potentially multiple IR fields (d). - Create larger groups, ending them as soon as the next bit field lies in a different byte (
<{ i8, i24 }>). But this means that there are potentially multiple bit fields with different byte offsets in a group, and thatbitOffsetalone isn't enough to determine the overall start bit offset from the group's IR field. Edit: I went with this.
I wrote a testprogram that generates random test cases, and then tests whether C and D behave the same: https://gist.github.com/JohanEngelen/137a2a8a86a2e1bcdadcc8c7ac81f373 I thought it might be useful to discover edge cases. Ran it for 2.5hrs with DMD 2.100.0, no bugs sofar.
Oh cool, thx for that, significantly raising my confidence in the frontend implementation. :+1:
Oh cool, thx for that, significantly raising my confidence in the frontend implementation. 👍
I'm not sure if writing all zeros and all ones is the difficult operation...? But yeah, mine too :)
Damn, I should have checked better. What's actually happening in the frontend is:
struct S {
unsigned char a:7; // byte offset 0, bit offset 0, bit width 7
_Bool b:1; // byte offset 0, bit offset 7, bit width 1
_Bool c:1; // byte offset 1, bit offset 0, bit width 1
unsigned d:22; // byte offset 1, bit offset 1, bit width 22
_Bool e:1; // byte offset 4, bit offset 0, bit width 1
};
So the struct size with the v2.100.1 frontend for Linux x64 is 8. For clang and gcc on godbolt, it's 4 as expected...
Edit: https://issues.dlang.org/show_bug.cgi?id=23293