seL4
seL4 copied to clipboard
`SEL4_FORCE_LONG_ENUM` not C standard compatible
So I might be a bit late to the game, but this trick we're using to force an enum to be long
does not seem to be legal C according to the standard. C99 and C11 say:
6.7.2.2 Enumeration specifiers 1) The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an
int
.
Clause 3 later says:
Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.
"an unsigned integer type" could be long
, but the int
in clause 1 is pretty clear. GCC and clang both seem to be doing this as we want, i.e. they satisfy the compile time asserts for the size of the enum type to be same as the size of long
.
This came up because the verification C parser does not handle the SEL4_FORCE_LONG_ENUM
in #724 as expected (it forces int
like the standard) and therefore later fails the proof in a pointer arithmetic expression down the road.
I can look into making the C parser more relaxed, but I wanted to put this out here for discussion first, because it doesn't seem to be officially supported by the standard, so could break later.
As long as binary verification runs, we will catch these problems, and even the compile time assert should already alert us to problems if they occur in future compiler versions, so I'm tending to try and make the C parser more relaxed. I'm not clear yet on how much work that is, because it will involve a new value analysis.
For context, this is what such an enum declaration looks like after preprocessing:
typedef enum {
SEL4_BOOTINFO_HEADER_PADDING = 0,
SEL4_BOOTINFO_HEADER_X86_VBE = 1,
SEL4_BOOTINFO_HEADER_X86_MBMMAP = 2,
SEL4_BOOTINFO_HEADER_X86_ACPI_RSDP = 3,
SEL4_BOOTINFO_HEADER_X86_FRAMEBUFFER = 4,
SEL4_BOOTINFO_HEADER_X86_TSC_FREQ = 5,
SEL4_BOOTINFO_HEADER_FDT = 6,
SEL4_BOOTINFO_HEADER_NUM,
_enum_pad_seL4_BootInfoID = ((1ULL << ((sizeof(long)*8) - 1)) - 1)
} seL4_BootInfoID;
Ah yes, I love this one, this C restriction screws our capability to give names to #define constants for verification, because they always end up as signed ints and so have issues being able to carry address values / masks etc.
Did you read the comment above SEL4_FORCE_LONG_ENUM
? Its goal feels sound, i.e. to stop clever compilers from picking something smaller than int. However yes, the name is misleading, as it can't give you a long, only and int.
What happens if you try to print _enum_pad_seL4_BootInfoID
? Do you get a truncation to MAX_INT
or a large negative number? I forget what the C standard says on that.
However yes, the name is misleading, as it can't give you a long, only and int.
On 64bit architectures, gcc still gives a 64bit enum when this macro is used which can be checked with a sizeof
in a static assert.
On 64bit architectures, gcc still gives a 64bit enum when this macro is used which can be checked with a
sizeof
in a static assert.
Ahh, aren't int
s 64-bit in GCC on 64-bit arches? Microsoft's compiler is one of the "int is 32-bit on 64-bit arch", hence my being overly pedantic, which is along the lines of what I'm guessing the C parser is doing as well.
Ints are 32bit in the kernel on gcc for all our supported architectures.
An int
on 64 bit is still 32 bits, a long
is 64.
The standard says "implementation defined" for what the enum
type actually ends up with, it just says that the constant expressions have to fit into an int
. The C parser fixes the enum
type to be signed int
, because that is what GCC did when we wrote it.
OK, then one of my assumptions was clearly wrong, apologies. I misremembered this statement: "An int and a long are 32-bit values on 64-bit Windows operating systems", and then misapplied it to int vs long. Now I'm seeing what's going on.
In that case, how does ((1ULL << ((sizeof(long)*8) - 1)) - 1)
make an expression that fits into an int? Furthermore even if you permit signed longs, that doesn't look like it'll fit in a signed long without overflow.
Did you read the comment above SEL4_FORCE_LONG_ENUM? Its goal feels sound, i.e. to stop clever compilers from picking something smaller than int. However yes, the name is misleading, as it can't give you a long, only and int.
Yes, the problem is that the largest type it can force the enum to be is int
In that case, how does ((1ULL << ((sizeof(long)*8) - 1)) - 1) make an expression that fits into an int?
It does not, that's my point. The C standard does not permit this expression in an enum.
Furthermore even if you permit signed longs, that doesn't look like it'll fit in a signed long without overflow.
Yes, it's trying to force unsigned long
, which it shouldn't. It so happens that GCC and clang are both fine with it and execute it "correctly" (as in "as we want it"). You can check that with an assert on sizeof(seL4_BootInfoID)
for instance, and by looking at the binary.
The question is how much do we want to rely on the compiler doing the right thing even though it doesn't really have to. We do rely on such things in other places, so it's not something really new, but we should do it consciously.
The C parser tends to stick to the C standard, historically. The C standard has this enum issue until C23 comes into effect, and the parser is still C99 I think. C23 comes with a lot of other magic that would be hard to support, so at some point we should discuss the future maintenance of C parser magic in any case.
For today, what are you going to do with this? We could potentially add an annotation to inform the C parser to override the size of a specific enum to be bigger than int, if that is permitted by the standard as you said?
For today, what are you going to do with this? We could potentially add an annotation to inform the C parser to override the size of a specific enum to be bigger than int, if that is permitted by the standard as you said?
I'm looking at making the C parser do the same as the compiler, i.e. checking what the largest value is that needs to be represented and choosing unsigned long
when that does not fit in int
.
This is not permitted by the standard, it just happens to work for GCC and clang. But our memory model is also not permitted by the standard, so we don't have to care too much about that bit, we just need awareness and consensus that we want to do this.
I don't know yet how much work it is to actually analyse all enum values for size. It might need too much code refactoring in the parser, because it so far didn't need to know the values to determine the type. Adding the type size as a tag internally to the enum type is relatively easy, but setting it in the first place I'm not sure yet.
Can you make that calculation a flag once you figure out whether it's feasible? We are violating the standard after all.
I don't think a flag would be good, that would pollute the code a lot (no globals in ML). I guess we could have one at the very outermost level after everything is done.
But since when do we care about the C standard?
In terms of C standard we mostly care about compiler and verification tool chain being in sync (also validated via binary verification, but of course better if that never fails), so if both extend the standard in the same way this is fine.
But: thinking more about it, a flag is not actually that hard, so yes, adding the flag for those who want to be more strictly standard conforming is fine.
I wonder what the other people who use AutoCorres think about this, they're not all doing kernel design. Perhaps these flags might be of relevance to other users of the C toolchain, who may want to not taunt the C beast more than necessary? Anyway, I appreciate your open mind.