Implement the `c_uchar` type
Hi! I am personally really interested on Zig myself, but also I'm interested on tinkering with things related to C23 on it.
This is my attempt to start with something: by adding c_uchar to Zig. While I have seen the position where unsigned int/char8_t should be treated as u8 and signed int as i8, I felt like this type needed to be implemented because ultimately, what I want to try is to implement stdbit.h on libzigc, and while you could technically use a u8 instead of unsigned char, I feel like that would have undesirable consequences, which is something I wanted to avoid.
So, this is c_uchar, it's guaranteed to be unsigned char. That should be it. I tried testing the test-libc suite as well as test -Dskip-non-native, and no issues were spotted other than standalone_test_cases.coff_dwarf failing, although that can be replicated on the branch's base commit. Let me know whenever this PR is sound or not.
(Note: c_schar was intentionally left unimplemented since it isn't required by any C standard library functions)
Relevant: https://github.com/ziglang/zig/issues/875
I feel like that would have undesirable consequences, which is something I wanted to avoid.
Could you elaborate on this? As the rationale for this language change (compared to the status quo intent of using u8/i8), it's very vague.
Of course, platforms with CHAR_BIT != 8 don't work with status quo; but that's a known limitation, and it's AFAIK undecided thus far whether Zig will support such targets.
Of course, platforms with
CHAR_BIT != 8don't work with status quo; but that's a known limitation, and it's AFAIK undecided thus far whether Zig will support such targets.
The only contemporary(-ish) architectures where you might encounter CHAR_BIT > 8 are some TI DSPs; at least C28x, C54x, and C55x to my knowledge. These all have 16-bit chars. C54x and C55x are from the 2010s and seem kinda EOL, but C28x is still alive and well. I do think we should avoid designing ourselves into a corner where we can't support these. However, that's effectively an argument for also adding c_schar, which this PR excludes.
I feel like that would have undesirable consequences, which is something I wanted to avoid.
Could you elaborate on this? As the rationale for this language change (compared to the status quo intent of using
u8/i8), it's very vague.
Considering that the ultimate goal is to have libzigc reimplement all C standard library functions and have them used instead of the static libcs's, replicating the ABI of unsigned char/signed char by using u8/i8 appears to be something that would be brittle.
I have to admit though, this is just gut instinct; ultimately I want other people's opinions on this to determine whenever this is a path that should be taken or if we should go for u8/i8 instead.
I do think we should avoid designing ourselves into a corner where we can't support these. However, that's effectively an argument for also adding
c_schar, which this PR excludes.
If this PR is sound, I can also implement c_schar here if needed.
appears to be something that would be brittle
This concern is still vague enough as to be essentially meaningless IMO. Currently, Zig heavily relies on u8 and i8 matching C unsigned char and signed char in ABI, and (again, ignoring for now targets with CHAR_BIT != 8), that's generally perfectly fine. This is more of a definition than an assumption; we can decide how u8/i8 operate in extern contexts, and we have decided that the behavior should mirror that of [un]signed char.
There is one genuine potential problem here, which is that some C ABIs might have multiple unsigned 8-bit integral types with different ABIs. For instance, they might define unsigned _BitInt(8) to be passed to functions differently than unsigned char. However, this is a more general problem (in fact it kind of applies to more-or-less every C type), and a satisfactory solution to it will require research into various ABIs and some tricky language design work (@alexrp and I have spoken a bit about this before). Luckily, this issue doesn't really manifest itself today for u8/i8, and there's a good chance it never does. It's also not something which is going to cause a surprise bug, because we would notice the problem while implementing support for such an ABI!
Regardless, this PR would not be a good solution to the problem stated above. If we were to accept this PR to solve that theoretical problem, it would mean that whenever you export a function consuming a C string, you would need to take in [*:0]const c_uchar instead of [*:0]const u8, lest you potentially get bitten by a random ABI mismatch some day. That would be a clear footgun and downgrade to Zig's guarantees.
I don't think there's any good reason to make this language change.
I don't think there's any good reason to make this language change.
The reasoning makes sense. I'll move forward with using u8/i8