ldc WIP: Bit fields

I hoped this day would never come, but we need crappy bit fields for ImportC - and D 2.101 (FWIW, I was very much against that decision). They mess up a bunch of assumptions (every member aligned at a byte boundary and of an integral size in bytes, so directly addressable) and come with no LLVM support AFAICT, so... much fun.

Aug 01 '22 18:08 kinke

WTF:

struct S {
    unsigned char a:7;
    _Bool b:1;
    _Bool c:1;
    unsigned d:23;
    unsigned e:7;
    unsigned f:2;
};

AST info (as of v2.100.1, FWIW on Linux):

a: byte offset 0, bit offset 0, type size 1
b: byte offset 0, bit offset 7, type size 1
c: byte offset 1, bit offset 0, type size 1
d: byte offset 1, bit offset 1, type size 4  (I expected: byte offset 0, bit offset 9)
e: byte offset 1, bit offset 24, type size 4 (I expected: byte offset 4, bit offset 0)
f: byte offset 5, bit offset 0, type size 4  (I expected: byte offset 4, bit offset 7)

This came up while thinking about how to map this to IR fields - I was thinking about a {i32, i32} in this case, and then merging the initializers of the involved fields.

Aug 02 '22 13:08 kinke

Holy moly, clang uses a { i48 } for the testcase above.

For

struct S {
    unsigned char a:7;
    _Bool b:1;
    _Bool c:1;
    unsigned ui:23;
};

it's { i32 } (no surprise there). With ui:24, it's a { i16, i24 }. With ui:25, { i16, i32 }. Geez.

Aug 02 '22 14:08 kinke

WTF:

struct S {
    unsigned char a:7;
    _Bool b:1;
    _Bool c:1;
    unsigned d:23;
    unsigned e:7;
    unsigned f:2;
};

AST info (as of v2.100.1, FWIW on Linux):

a: byte offset 0, bit offset 0, type size 1
b: byte offset 0, bit offset 7, type size 1
c: byte offset 1, bit offset 0, type size 1
d: byte offset 1, bit offset 1, type size 4  (I expected: byte offset 0, bit offset 9)
e: byte offset 1, bit offset 24, type size 4 (I expected: byte offset 4, bit offset 0)
f: byte offset 5, bit offset 0, type size 4  (I expected: byte offset 4, bit offset 7)

This feels almost random... :S

Aug 02 '22 20:08 JohanEngelen

Holy moly, clang uses a { i48 } for the testcase above.

The frontend has to already have the C algorithm for calculating the sizeof, right? (for when a user queries and for putting it in arrays)

Aug 02 '22 20:08 JohanEngelen

The frontend has to already have the C algorithm for calculating the sizeof, right? (for when a user queries and for putting it in arrays)

Yeah well I hope it gets the field (byte) offsets and overall aggregate sizes right. ;) - But the bit field info is quite limited AFAICT; there's just an extra bit offset and width in bits, that's it. The size in bytes of a bit field is computed in an extra manner in the DMD glue layer (and apparently depends on the target's C compiler): https://github.com/dlang/dmd/blob/114deec135f0bb62d5fc8c4788727c58b2e0ca42/compiler/src/dmd/todt.d#L983-L997

I've quickly looked at https://en.cppreference.com/w/cpp/language/bit_field - plain horrible, and all this implementation-defined crap would now land in D too. [There's even int a:0 and int a:1000 apparently, surely not correctly implemented…]

Aug 03 '22 09:08 kinke

Maybe we can get away with not supporting initializers for bit fields initially, neither for init symbols nor for literals, and just filling them with [N x i8] zero-initialized 'padding' bytes. And requiring explicit assignments to these fields later, where we don't require a suited IR type for the initializer constant.

Aug 03 '22 10:08 kinke

Okay, the bulk of the work is done, upstream tests are green.

Aug 05 '22 18:08 kinke

Oh, scratch that; my assumption that consecutive bit fields can be grouped by their byte offset doesn't work - these groups can overlap (even without unions):

struct S {
  unsigned char a:7; // byte offset 0, bit offset 0, bit width 7
  _Bool b:1;         // byte offset 0, bit offset 7, bit width 1
  _Bool c:1;         // byte offset 1, bit offset 0, bit width 1
  unsigned d:22;     // byte offset 1, bit offset 1, bit width 22
  _Bool e:1;         // byte offset 3, bit offset 7, bit width 1
};

This apparently isn't tested upstream (yields an ICE with this PR) and complicates matters significantly, again mostly for the IR-type mapping. Some options:

Look into this in the frontend and try to come up with saner offsets for non-overlapping groups.
Allow for 'arbitrary' IR fields (e.g., <{ i8, i16, i8 }> for the above), and break up constant initializers into potentially multiple IR fields (d).
Create larger groups, ending them as soon as the next bit field lies in a different byte (<{ i8, i24 }>). But this means that there are potentially multiple bit fields with different byte offsets in a group, and that bitOffset alone isn't enough to determine the overall start bit offset from the group's IR field. Edit: I went with this.

Aug 07 '22 12:08 kinke

I wrote a testprogram that generates random test cases, and then tests whether C and D behave the same: https://gist.github.com/JohanEngelen/137a2a8a86a2e1bcdadcc8c7ac81f373 I thought it might be useful to discover edge cases. Ran it for 2.5hrs with DMD 2.100.0, no bugs sofar.

Aug 07 '22 21:08 JohanEngelen

Oh cool, thx for that, significantly raising my confidence in the frontend implementation. :+1:

Aug 08 '22 15:08 kinke

Oh cool, thx for that, significantly raising my confidence in the frontend implementation. 👍

I'm not sure if writing all zeros and all ones is the difficult operation...? But yeah, mine too :)

Aug 08 '22 16:08 JohanEngelen

Damn, I should have checked better. What's actually happening in the frontend is:

struct S {
  unsigned char a:7; // byte offset 0, bit offset 0, bit width 7
  _Bool b:1;         // byte offset 0, bit offset 7, bit width 1
  _Bool c:1;         // byte offset 1, bit offset 0, bit width 1
  unsigned d:22;     // byte offset 1, bit offset 1, bit width 22
  _Bool e:1;         // byte offset 4, bit offset 0, bit width 1
};

So the struct size with the v2.100.1 frontend for Linux x64 is 8. For clang and gcc on godbolt, it's 4 as expected...

Edit: https://issues.dlang.org/show_bug.cgi?id=23293

Aug 14 '22 18:08 kinke