zig icon indicating copy to clipboard operation
zig copied to clipboard

Implement the `c_uchar` type

Open EnnuiL opened this issue 2 months ago • 6 comments

Hi! I am personally really interested on Zig myself, but also I'm interested on tinkering with things related to C23 on it.

This is my attempt to start with something: by adding c_uchar to Zig. While I have seen the position where unsigned int/char8_t should be treated as u8 and signed int as i8, I felt like this type needed to be implemented because ultimately, what I want to try is to implement stdbit.h on libzigc, and while you could technically use a u8 instead of unsigned char, I feel like that would have undesirable consequences, which is something I wanted to avoid.

So, this is c_uchar, it's guaranteed to be unsigned char. That should be it. I tried testing the test-libc suite as well as test -Dskip-non-native, and no issues were spotted other than standalone_test_cases.coff_dwarf failing, although that can be replicated on the branch's base commit. Let me know whenever this PR is sound or not.

(Note: c_schar was intentionally left unimplemented since it isn't required by any C standard library functions)

EnnuiL avatar Nov 08 '25 23:11 EnnuiL

Relevant: https://github.com/ziglang/zig/issues/875

squeek502 avatar Nov 09 '25 00:11 squeek502

I feel like that would have undesirable consequences, which is something I wanted to avoid.

Could you elaborate on this? As the rationale for this language change (compared to the status quo intent of using u8/i8), it's very vague.

Of course, platforms with CHAR_BIT != 8 don't work with status quo; but that's a known limitation, and it's AFAIK undecided thus far whether Zig will support such targets.

mlugg avatar Nov 09 '25 10:11 mlugg

Of course, platforms with CHAR_BIT != 8 don't work with status quo; but that's a known limitation, and it's AFAIK undecided thus far whether Zig will support such targets.

The only contemporary(-ish) architectures where you might encounter CHAR_BIT > 8 are some TI DSPs; at least C28x, C54x, and C55x to my knowledge. These all have 16-bit chars. C54x and C55x are from the 2010s and seem kinda EOL, but C28x is still alive and well. I do think we should avoid designing ourselves into a corner where we can't support these. However, that's effectively an argument for also adding c_schar, which this PR excludes.

alexrp avatar Nov 09 '25 11:11 alexrp

I feel like that would have undesirable consequences, which is something I wanted to avoid.

Could you elaborate on this? As the rationale for this language change (compared to the status quo intent of using u8/i8), it's very vague.

Considering that the ultimate goal is to have libzigc reimplement all C standard library functions and have them used instead of the static libcs's, replicating the ABI of unsigned char/signed char by using u8/i8 appears to be something that would be brittle.

I have to admit though, this is just gut instinct; ultimately I want other people's opinions on this to determine whenever this is a path that should be taken or if we should go for u8/i8 instead.

I do think we should avoid designing ourselves into a corner where we can't support these. However, that's effectively an argument for also adding c_schar, which this PR excludes.

If this PR is sound, I can also implement c_schar here if needed.

EnnuiL avatar Nov 09 '25 12:11 EnnuiL

appears to be something that would be brittle

This concern is still vague enough as to be essentially meaningless IMO. Currently, Zig heavily relies on u8 and i8 matching C unsigned char and signed char in ABI, and (again, ignoring for now targets with CHAR_BIT != 8), that's generally perfectly fine. This is more of a definition than an assumption; we can decide how u8/i8 operate in extern contexts, and we have decided that the behavior should mirror that of [un]signed char.

There is one genuine potential problem here, which is that some C ABIs might have multiple unsigned 8-bit integral types with different ABIs. For instance, they might define unsigned _BitInt(8) to be passed to functions differently than unsigned char. However, this is a more general problem (in fact it kind of applies to more-or-less every C type), and a satisfactory solution to it will require research into various ABIs and some tricky language design work (@alexrp and I have spoken a bit about this before). Luckily, this issue doesn't really manifest itself today for u8/i8, and there's a good chance it never does. It's also not something which is going to cause a surprise bug, because we would notice the problem while implementing support for such an ABI!

Regardless, this PR would not be a good solution to the problem stated above. If we were to accept this PR to solve that theoretical problem, it would mean that whenever you export a function consuming a C string, you would need to take in [*:0]const c_uchar instead of [*:0]const u8, lest you potentially get bitten by a random ABI mismatch some day. That would be a clear footgun and downgrade to Zig's guarantees.

I don't think there's any good reason to make this language change.

mlugg avatar Nov 09 '25 13:11 mlugg

I don't think there's any good reason to make this language change.

The reasoning makes sense. I'll move forward with using u8/i8

EnnuiL avatar Nov 09 '25 13:11 EnnuiL