seL4 icon indicating copy to clipboard operation
seL4 copied to clipboard

`SEL4_FORCE_LONG_ENUM` not C standard compatible

Open lsf37 opened this issue 3 years ago • 15 comments

So I might be a bit late to the game, but this trick we're using to force an enum to be long does not seem to be legal C according to the standard. C99 and C11 say:

6.7.2.2 Enumeration specifiers 1) The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

Clause 3 later says:

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined but shall be capable of representing the values of all the members of the enumeration.

"an unsigned integer type" could be long, but the int in clause 1 is pretty clear. GCC and clang both seem to be doing this as we want, i.e. they satisfy the compile time asserts for the size of the enum type to be same as the size of long.

This came up because the verification C parser does not handle the SEL4_FORCE_LONG_ENUM in #724 as expected (it forces int like the standard) and therefore later fails the proof in a pointer arithmetic expression down the road.

I can look into making the C parser more relaxed, but I wanted to put this out here for discussion first, because it doesn't seem to be officially supported by the standard, so could break later.

As long as binary verification runs, we will catch these problems, and even the compile time assert should already alert us to problems if they occur in future compiler versions, so I'm tending to try and make the C parser more relaxed. I'm not clear yet on how much work that is, because it will involve a new value analysis.

lsf37 avatar Jan 06 '22 23:01 lsf37

For context, this is what such an enum declaration looks like after preprocessing:

typedef enum {
    SEL4_BOOTINFO_HEADER_PADDING = 0,
    SEL4_BOOTINFO_HEADER_X86_VBE = 1,
    SEL4_BOOTINFO_HEADER_X86_MBMMAP = 2,
    SEL4_BOOTINFO_HEADER_X86_ACPI_RSDP = 3,
    SEL4_BOOTINFO_HEADER_X86_FRAMEBUFFER = 4,
    SEL4_BOOTINFO_HEADER_X86_TSC_FREQ = 5,
    SEL4_BOOTINFO_HEADER_FDT = 6,
    SEL4_BOOTINFO_HEADER_NUM,
    _enum_pad_seL4_BootInfoID = ((1ULL << ((sizeof(long)*8) - 1)) - 1)
} seL4_BootInfoID;

lsf37 avatar Jan 06 '22 23:01 lsf37

Ah yes, I love this one, this C restriction screws our capability to give names to #define constants for verification, because they always end up as signed ints and so have issues being able to carry address values / masks etc.

Did you read the comment above SEL4_FORCE_LONG_ENUM? Its goal feels sound, i.e. to stop clever compilers from picking something smaller than int. However yes, the name is misleading, as it can't give you a long, only and int.

What happens if you try to print _enum_pad_seL4_BootInfoID? Do you get a truncation to MAX_INT or a large negative number? I forget what the C standard says on that.

Xaphiosis avatar Jan 06 '22 23:01 Xaphiosis

However yes, the name is misleading, as it can't give you a long, only and int.

On 64bit architectures, gcc still gives a 64bit enum when this macro is used which can be checked with a sizeof in a static assert.

kent-mcleod avatar Jan 06 '22 23:01 kent-mcleod

On 64bit architectures, gcc still gives a 64bit enum when this macro is used which can be checked with a sizeof in a static assert.

Ahh, aren't ints 64-bit in GCC on 64-bit arches? Microsoft's compiler is one of the "int is 32-bit on 64-bit arch", hence my being overly pedantic, which is along the lines of what I'm guessing the C parser is doing as well.

Xaphiosis avatar Jan 06 '22 23:01 Xaphiosis

Ints are 32bit in the kernel on gcc for all our supported architectures.

kent-mcleod avatar Jan 06 '22 23:01 kent-mcleod

An int on 64 bit is still 32 bits, a long is 64.

lsf37 avatar Jan 06 '22 23:01 lsf37

The standard says "implementation defined" for what the enum type actually ends up with, it just says that the constant expressions have to fit into an int. The C parser fixes the enum type to be signed int, because that is what GCC did when we wrote it.

lsf37 avatar Jan 07 '22 00:01 lsf37

OK, then one of my assumptions was clearly wrong, apologies. I misremembered this statement: "An int and a long are 32-bit values on 64-bit Windows operating systems", and then misapplied it to int vs long. Now I'm seeing what's going on.

In that case, how does ((1ULL << ((sizeof(long)*8) - 1)) - 1) make an expression that fits into an int? Furthermore even if you permit signed longs, that doesn't look like it'll fit in a signed long without overflow.

Xaphiosis avatar Jan 07 '22 00:01 Xaphiosis

Did you read the comment above SEL4_FORCE_LONG_ENUM? Its goal feels sound, i.e. to stop clever compilers from picking something smaller than int. However yes, the name is misleading, as it can't give you a long, only and int.

Yes, the problem is that the largest type it can force the enum to be is int

In that case, how does ((1ULL << ((sizeof(long)*8) - 1)) - 1) make an expression that fits into an int?

It does not, that's my point. The C standard does not permit this expression in an enum.

Furthermore even if you permit signed longs, that doesn't look like it'll fit in a signed long without overflow.

Yes, it's trying to force unsigned long, which it shouldn't. It so happens that GCC and clang are both fine with it and execute it "correctly" (as in "as we want it"). You can check that with an assert on sizeof(seL4_BootInfoID) for instance, and by looking at the binary.

The question is how much do we want to rely on the compiler doing the right thing even though it doesn't really have to. We do rely on such things in other places, so it's not something really new, but we should do it consciously.

lsf37 avatar Jan 07 '22 00:01 lsf37

The C parser tends to stick to the C standard, historically. The C standard has this enum issue until C23 comes into effect, and the parser is still C99 I think. C23 comes with a lot of other magic that would be hard to support, so at some point we should discuss the future maintenance of C parser magic in any case.

For today, what are you going to do with this? We could potentially add an annotation to inform the C parser to override the size of a specific enum to be bigger than int, if that is permitted by the standard as you said?

Xaphiosis avatar Jan 07 '22 00:01 Xaphiosis

For today, what are you going to do with this? We could potentially add an annotation to inform the C parser to override the size of a specific enum to be bigger than int, if that is permitted by the standard as you said?

I'm looking at making the C parser do the same as the compiler, i.e. checking what the largest value is that needs to be represented and choosing unsigned long when that does not fit in int.

This is not permitted by the standard, it just happens to work for GCC and clang. But our memory model is also not permitted by the standard, so we don't have to care too much about that bit, we just need awareness and consensus that we want to do this.

I don't know yet how much work it is to actually analyse all enum values for size. It might need too much code refactoring in the parser, because it so far didn't need to know the values to determine the type. Adding the type size as a tag internally to the enum type is relatively easy, but setting it in the first place I'm not sure yet.

lsf37 avatar Jan 07 '22 00:01 lsf37

Can you make that calculation a flag once you figure out whether it's feasible? We are violating the standard after all.

Xaphiosis avatar Jan 07 '22 00:01 Xaphiosis

I don't think a flag would be good, that would pollute the code a lot (no globals in ML). I guess we could have one at the very outermost level after everything is done.

But since when do we care about the C standard?

lsf37 avatar Jan 07 '22 00:01 lsf37

In terms of C standard we mostly care about compiler and verification tool chain being in sync (also validated via binary verification, but of course better if that never fails), so if both extend the standard in the same way this is fine.

But: thinking more about it, a flag is not actually that hard, so yes, adding the flag for those who want to be more strictly standard conforming is fine.

lsf37 avatar Jan 07 '22 02:01 lsf37

I wonder what the other people who use AutoCorres think about this, they're not all doing kernel design. Perhaps these flags might be of relevance to other users of the C toolchain, who may want to not taunt the C beast more than necessary? Anyway, I appreciate your open mind.

Xaphiosis avatar Jan 07 '22 02:01 Xaphiosis