CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

Restrict construction of `enum class` values manifestly outside intended usage

Open GabrielDosReis opened this issue 4 years ago • 8 comments

Consider

enum class TestEnum : uint32_t { One, Two, Three };

int main() {
   TestEnum t { 17 };       // L1
}

Prior to C++17, line L1 was invalid and diagnosed at compile-time. The proposal P0138R0, P0138R1 for C++17 also made that ill-formed. However, the adopted revision P0138R2 -- at the insistence of CWG -- allowed it. That allowance has proven to easy source of mistakes in practice, in more realistic situations such as

enum class AccessType : uint32_t { Read, Write, Execute, };

where no other values of AccessType is expected or tolerated.

The suggestion here is to have a rule that bans the construction of enum class values that are manifestly outside the set expressly indicated in the definition of the enum class. For example,

enum class Foo : int { 
    Apple = 0,
    Banana = 1,
    Cookie = 2,
};

// Allowed
Foo foo { 0 };
Foo foo1 { 1 };
Foo foo2 { 2 };

// Not allowed
Foo foo3 { 3 };

// Allowed
Foo foo { static_cast<Foo>(3) };

GabrielDosReis avatar Nov 11 '21 19:11 GabrielDosReis

Editors meeting: seems reasonable, pull request welcome

cubbimew avatar Nov 11 '21 21:11 cubbimew

// Allowed Foo foo { 0 }; Foo foo1 { 1 }; Foo foo2 { 2 };

Why would you do that in the first place? The only reason I'd construct a Foo from an int instead of using the predefined constants, is because

a) I deliberately want to create a value for which no predefined enumerate value exists (e.g. std::byte), or b) because the value is a dynamic one (e.g. returned from a c-api) in which case the analyzer usually anyway can't prove statically that the value is "in range".

So imho you should either forbid the use of that construction syntax in general and always require a static_cast or continue to allow my_byte_t{0xff} in general.

MikeGitb avatar Nov 12 '21 06:11 MikeGitb

// Allowed Foo foo { 0 }; Foo foo1 { 1 }; Foo foo2 { 2 };

Why would you do that in the first place? The only reason I'd construct a Foo from an int instead of using the predefined constants, is because

I can understand such needs - namely when serializing to/from binary data sets (e.g network, disk, etc); however, in such cases it should be required that the enum be anchored so that the values can be validated by tooling instead of relying on compiler consistency.

Analyzers can always do a base+count to validate the enums; though that fails when enums are specified with values anchored at different points:

class enum x: int {
   a = 0,
   b = 5,
   c = 20, 
   d,
   e = 50,
};

which may be useful for API definitions. And yes, I've seen plenty of code that does similar kinds of things.

BenjamenMeyer avatar Nov 12 '21 14:11 BenjamenMeyer

I can understand such needs - namely when serializing to/from binary data sets (e.g network, disk, etc); however, in such cases it should be required that the enum be anchored so that the values can be validated by tooling instead of relying on compiler consistency.

I'm sorry, but I don't understand how that relates to my question.

If I already have a name for Foo{1} ( namely Foo::Banana in GDR's example), why would I initialize a variable with a numerical value instead of using that name. I.e. Why

Foo foo{1};

instead of

Foo foo = Foo::Banana;

MikeGitb avatar Nov 12 '21 15:11 MikeGitb

I can understand such needs - namely when serializing to/from binary data sets (e.g network, disk, etc); however, in such cases it should be required that the enum be anchored so that the values can be validated by tooling instead of relying on compiler consistency.

I'm sorry, but I don't understand how that relates to my question.

If I already have a name for Foo{1} ( namely Foo::Banana in GDR's example), why would I initialize a variable with a numerical value instead of using that name. I.e. Why

Foo foo{1};

instead of

Foo foo = Foo::Banana;

Because you're not using a pure numeric but a read in value:

int serializedValue = data[0];
...
Foo foo { serializedValue };

thus mapping the serialized value to the enum.

I do agree that going straight to the numeric:

Foo foo {17};

is lazy, sloppy coding b/c you're not translating - you're using magic numbers to do things that shouldn't be done. You're being lazy because instead of using the real name you're using it's value which might change and be a subtle, hard to find bug. However, I can still understand the usage of a numeric value like that for the serialization purposes - still being lazy b/c one didn't setup a conversion function with a proper switch or map; it's still better to do a proper conversion function so one can properly detect errors of bad values being received.

BenjamenMeyer avatar Nov 12 '21 16:11 BenjamenMeyer

Because you're not using a pure numeric but a read in value:

int serializedValue = data[0]; ... Foo foo { serializedValue };

thus mapping the serialized value to the enum.

Yes, that's what I meant with

b) because the value is a dynamic one (e.g. returned from a c-api) in which case the analyzer usually anyway can't prove statically that the value is "in range".

I didn't say anything against the Foo{<EXPR>} syntax in general. Only against the specific Foo{ 1 } from the proposed text, when Foo::Banana is already defined as such.

MikeGitb avatar Nov 12 '21 23:11 MikeGitb

@MikeGitb

Why would you do that in the first place? The only reason I'd construct a Foo from an int instead of using the predefined constants, is because ...

Those may be the only reasons you have, but others may actually have to use C_STYLE_SYSTEM_MACRO to construct a fully typed constant, and that macro would expand to that numerical constant. I don't think we should focus too much on the 0 or 1 constant used to illustrate the construction of values in the issue.

@BenjamenMeyer

You're being lazy because instead of using the real name you're using it's value which might change and be a subtle, hard to find bug. However, I can still understand the usage of a numeric value like that for the serialization purposes - still being lazy b/c one didn't setup a conversion function with a proper switch or map; it's still better to do a proper conversion function so one can properly detect errors of bad values being received.

That is the case of C_STYLE_SYSTEM_MACRO (possibly involving conditional operator) that expands to such a numerical constant, and the enumeration is introduced as a typeful projection over an existing system header. Of course, if you know the constant and the corresponding enumerator name, you should use that directly.

GabrielDosReis avatar Nov 14 '21 13:11 GabrielDosReis

+1 to what @MikeGitb has said

the pre-C++17 syntax for enum init from braces {} is ideal for pure-"enumeration" users (i.e. using an enum as a list of predefined enumerators) because it disallows init from non-enumerators (which is almost always a mistake and a bug factory for enumeration usage) and yet it still allows for flexible alternative usage (e.g. parsing a value into an enum, or using it as an integral type) via static_cast which makes it explicit the enum is being used for a purpose other than a mathematical enumeration.

@GabrielDosReis and I have discussed offline. It's an interesting dilemma because 'enum' as defined by C/C++ (in my view, unfortunately) takes on meanings other than just a mathematical enumeration (i.e. it is also used to introduce a new integral type). It's an overloaded syntax that tries to achieve much and in doing so ends up short in a few places.

For example, syntax like MyEnum x(1234) is allowed, and so it's a bit of an odd situation where MyEnum x{1234} does something different.

C++17 (in my view, unfortunately) better aligned the 2 syntaxes so that both are now valid, but that achieves less safety for users of 'enum' that mean it as a pure enumeration (init from enumerator only). This proposal tries to address it but I think it leaves us in an odder situation where we try to retain some safety of init but having the init value allowed only if it matches an enumerator. As @MikeGitb says, why then not just use the enumerator?

i.e. If I am reading code that uses 'enum' as a mathematical enumeration of predefined labels, I would consider it a mistake if the enum is directly init from underlying type. I am not trying to see if it is really a mistake by trying to map the value of the initializer to an enumerator because the author should have just used that enumerator.

I'd rather not make these rules even more complex. Ideally we walk back the C++17 standard change to offer us back the safety it previously had that limited init from enumerator only when using {}. That could be a long-shot, so if we can't do that, than as C++ developers we have to accept the fact that 'enum' is a 4-letter keyword that doesn't really mean an "enumeration" in the common mathematical sense of a set of predefined labels. It means something else that also allows it to take on arbitrary integral values. In my view I see that as a type safety loss for C++ but I'd rather that than complicating the rules in a way that still don't enforce init from enumerator.

apenn-msft avatar Dec 02 '21 06:12 apenn-msft