c-cpp-trivia
c-cpp-trivia copied to clipboard
A collection of neat C and C++ trivia and oddities.
This repository is a collection of neat C & C++ trivia and oddities.
Table of contents:
- Both languages
- "Special operators"
- Bugs and Implementation Quirks
- C++
- Bugs and Implementation Quirks
- C
- Bugs and Implementation Quirks
- Talks
Both languages
0is technically tokenized as an octal literal.- Array access is commutative:
arr[i]andi[arr]are equivalent. This is because array access is defined as a direct translation to*(arr + i). sizeof(0)["abcd"]is1.- C and C++ grammar allows prototypes in declaration lists:
int a, foo(), * bar(), main();. https://www.google.comis a valid line of C/C++ code, but you're limited to one occurrence of each protocol per function.- Operator precedence and associativity is not the same as order of evaluation. The following are all undefined or unspecified behavior:
void foo(int i, int* arr) {
i = i++; // UB
i = i++ + ++i; // UB
arr[i] = i++; // UB
bar(puts("a"), puts("b")); // clang spits out a b, gcc spits out b a
}
https://en.cppreference.com/w/cpp/language/eval_order
- Unknown attributes are ignored without causing an error (since C++17 and C23). This allows all sorts of attribute nonsense (And all of these can of course be applied to variables too):
[[std::vector]] void foo() {} // Yes, even in C
[[code::blocks]] void foo() {}
[[]] void foo() {}
[[,]] void foo() {}
[[]][[]][[]][[]][[]] void foo() {}
[[typedef ::long]] void foo() {}
[[
#include "/proc/cpuinfo"
]] void foo() {}
// C++ only:
[[foo...]] void foo() {}
[[using std:]] void foo() {}
- Attributes may appear almost anywhere in a declaration:
[[foo]] int [[bar]] baz [[biz]] () [[buz]];
[[foo]] constexpr [[bar]] int [[baz]] biz [[buz]] () [[boz]];
// ^ second one is gcc and msvc only, decl-specifier-spec technically prevents an attribute here
- The operand of the
sizeofoperator cannot be a C-style cast.sizeof (int)*pis parsed as(sizeof(int)) * prather thansizeof((int)*p). - Precedence is ignored in the conditional operator between
?and::c ? a = 1, y = 2 : foo();is parsed asc ? (a = 1, y = 2) : foo();. llUis a valid (non-user-defined) integer suffix(void)cast
void foo(int x) {
(void)x; // useful for suppressing unused parameter warnings
// C++ only: (will be a warning with -Wpedantic)
return (void)"You can also return anything from a void function";
}
- You cannot augment a typedef (or
usingalias) withunsigned:
typedef long long ll
void foo(unsigned ll) {} // unsigned implies unsigned int, ll here is a parameter name
typedefis a storage class specifier and can appear before, after, or in the middle of a type in a declaration
unsigned typedef int u32;
- Preprocessor directives can be empty:
#include <stdio.h>
#
#
int main() {
#
// ...
}
- Switch statement bodies are allowed to be single statements as opposed to statement sequences (or compound statements), like other control flow structures:
switch(x) case 1: case 2: puts("foo");
- Case labels do not need to be in the top-level statement sequence
int x = 2;
int i = 0;
switch(x) {
default:
if(foo()) {
while(i++ < 5) {
case 2:
puts("lol");
}
}
}
"a" + 1 == ""can technically evaluate totrue. As can"a" == "a\0\0".- C and C++ support a set of digraph and trigraph tokens to accommodate certain archaic keyboards. Trigraphs were removed from C++ in C++17.
- ISO C forbids conversion between a function and object pointers:
However, if taking the address to the function pointer first, then casting tovoid (*func_ptr)() = dlsym(mylib, "func"); // gcc yields a warning with standard C17 in pedantic modevoid**and finally dereferencing this pointer again, makes it work without warnings:void (*func_ptr)(); *(void**)&func_ptr = dlsym(mylib, "func"); - It's possible to declare multiple functions at once and use typedefs / using decllarations for signatures:
// declares void foo(int); void* baz(float);
void foo(int), * bar(float);
// declares void foo(); void bar();
typedef void fn(); // or using fn = void();
fn foo, bar;
"Special operators"
- "
-->operator", really just a combination of two operators
int x = 10;
while (x --> 0) { // x goes to 0
printf("%d ", x);
}
| Syntax | Meaning | Mnemonic |
|---|---|---|
| -~y | y + 1 | Tadpole swimming toward a value makes it bigger |
| ~-y | y - 1 | Tadpole swimming away from a value makes it smaller |
- "Unset operator":
x &~ maskunsetsmaskbits inx - Boolean identity:
!-!b
Bugs and Implementation Quirks
0XE+2should evaluate to16, however, both gcc and clang give an error:invalid suffix "+2" on integer constant. Both bugs are known: gcc, clang. MSVC handles it correctly. This may be due to the definition ofpp-numbers and is mentioned in the standard https://eel.is/c++draft/lex.pptoken#example-2.- Clang / LLVM internally can start doing non-multiple of 8 arithmetic in its internal representation (even without the
use of
_ExtIntor_BitInt). For example, this code results in 33-bit arithmetic as a result of the optimizer identifying the loop induction.
C++
- The size of an empty struct is
1. This is because the C++ memory model guarantees disjoint storages (and thus disjoint addresses) for all distinct objects. https://eel.is/c++draft/basic.memobj#intro.object-9.sentence-2 - All types must be deduced the same in an
autodeclarator list. I.e.auto x = 1, y = 1.5;is not allowed. - What would be idiomatic uses of
mallocin C are UB in C++ prior to C++23, more details here
struct S { int x; };
S* s = malloc(sizeof(S));
s->x = 1; // an object S hasn't been created and its lifetime hasn't started, placement new is required to make this well-formed
- C++ supports a set of alternative tokens such as
and,or,bitand,compl, etc. which are equivalent to their primary counterparts. Truly, equivalent:
struct S {
S() = default;
S(const S bitand) = delete;
S(S and) = delete;
compl S() = default;
}
void foo() {
char b[sizeof(S)];
new (&b) S();
((S*)b)->compl S();
}
- Vexing parse:
// Vexing parse: This isn't a variable, it's a function declaration
T foo();
// Most vexing parse: This is still a function declaration (taking a T(*)())
T foo(T());
// "More vexing parse":
T foo(T((()))); // This is also a function declaration taking a T(*)()
T foo(T (((a)))); // this is a function declaration taking a T
// This is a variable definition
T foo((T()));
- C++ structs can have stray semicolons:
struct S { ;;;;; };
- The following is The following code is probably, technically, well-formed in the current working draft of the standard (and may have been before too):
template<typename T> void main(T) {}
int main() {}
This is related to changes in P1787. Sadly, no compiler supports this.
- Function try-blocks are a convenient way to wrap an entire function body with exception handlers and the only way to catch exceptions in member initializer lists:
template<typename T> struct S {
T t;
S(const T& t) try : t(t) {
...
} catch(...) {
...
}
};
noexceptis both a specifier and operator
void foo() noexcept(noexcept(noexcept(true))) {}
throw()is the same asnoexceptsince C++17.- You can write
extern "C++"as well asextern "C", these are the only two standard linkage languages, but others can be defined by the implementation. Give usextern "Python"andextern "Java"! - A declaration can have arbitrarily many linkage language specifiers:
extern "C" extern "C++" extern "C" extern "C++" void foo(int) {}
The innermost specification is used. https://eel.is/c++draft/dcl.link#5.sentence-2
- The language grammar allows
for-styleinit-statements inswitchandifstatements, Since C++17:
switch(int x = foo(); t[x]) { ... }
if(auto [a, b, c] = foo(); c) { ... }
// ranged for allows an init-statement too (just no iteration-expression)
for(auto [vec, map] = foo.bar(); const auto& item : vec) { ... }
whileloops do not supportinit-statements because that would make them just another for loop.- A
conditionmay be a declaration. This allows up to two declarations perswitch,if, orforstatement:
if(int x = foo()) { ... } // intended use
if(auto [a, b, c] = foo(); auto x = bar(a, b, c)) { ... }
for(auto [a, b, c] = foo(); int x = baz(); c++) { ... }
- While an
init-statementmay make array or structured binding declarations,conditiondeclarations may not. I.e. these are not valid:
if(auto [a, b, c] = foo()) { }
if(int arr[] = {1, 2, 3, 4}) { }
- The following are valid C++ statements:
if(; true) { ... } // empty init-statement
if(false; true) { ... }
if(auto main() -> int; true) { ... }
if(class foobar; true) { ... }
if(typedef int i32; true) { ... }
if(using A = B; true) { ... } // Since C++23
for(struct { int a = 0, b = 100; } s; s.a < s.b; s.a++, s.b--) { ... }
- We cannot, however, do any of the following:
if(static_assert(true); true) { ... }
if(using namespace std; true) { ... }
if(extern "C" int puts(const char*); true) { puts("hello world"); }
if(friend void operator<<(); true) { ... } // syntactically valid, not semantically valid
gotois disallowed inconstexprfunctions until C++23staticstorage local variables are not permitted in constexpr functions until C++23- Structured bindings can't be used in constexpr declarations
- The following is a valid "hello world" implementation
auto& hello_world = std::cout<<"Hello World"<<std::endl;
int main() {}
- A lambda's parameter list can be omitted:
[]{ return 42; }. (*****+***+**+*+[]{})();is valid C++. Global operatorsT& operator*(T*)andT* operator+(T*)can be used on lambdas with no captures (which decay to function pointers).- The following is valid (since C++23)
[] [[deprecated]] [[deprecated]] {}; // self-deprecating lambda
[[likely]]can be applied outside of control flow structures:
[[likely]] ;
[[likely]] {};
[[likely]] 1 + 2;
final,override,import, andmodulearen't keywords but have special meaning in certain contexts. Thus, this is valid C++:
struct final final {
virtual final override() final { return {}; }
};
void final() {
struct final typedef override;
struct final final = override().override();
}
https://eel.is/c++draft/lex.name#2
- There are special rules for lexing
<:digraphs so thatstd::vector<::std::string>is lexed correctly and not asstd::vector[:std::string>:
Otherwise, if the next three characters are <:: and the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token <:
https://eel.is/c++draft/lex.pptoken#3.2
-
std::numeric_limits<T>::maxand related functions are functions because there was originally concern that some values may not be available at compile time. E.g.std::numeric_limits<float>::minwhich was dependant on rounding mode. These functions areconstexprsince C++11 but at that point it was too late to change them from functions. -
The original proposed syntax for lambdas looked like
<>(int x) : [y] (x + y)(what's now[y](int x) { return x + y; }).<&>(x) ( x * y )or<&>(x) -> int { return x * y; }would have been the syntax for[&](auto x) { return x * y; }. Also, in the original proposal there was no mutable keyword for lambdas. Instead the call operator was always const and captures were always marked mutable. Initial proposal papers: N1958, N1968, N2329 (N1968 rev 1). -
std::string('0', '0')is a string of 48'0''s,std::string{'0', '0'}is the string"00" -
James Bond was added to the C++ standard in C++17
-
The C++ standard contains a small poem:
When writing a specialization, be careful about its location; or to make it compile will be such a trial as to kindle its self-immolation.
https://eel.is/c++draft/temp.spec#temp.expl.spec-8
-
CV qualifiers don't apply to objects their construction is complete, and relatedly there are no cv-qualified constructors
-
Array elements, and objects in general, are always destroyed in reverse order of construction. Standard quote for arrays
-
A lambda's
operator()is automaticallyconstexprif it meets the requirements for a constexpr function https://eel.is/c++draft/expr.prim.lambda.closure#5.sentence-6
Bugs and Implementation Quirks
decltype(std)is anintin gcc. Bug reports: #1, #2.- Prior to gcc 10,
decltype(decltype(decltype))could be used to generate exponential error messages. typedef int i = 0;segfaults msvc- This compiles and links in gcc
namespace foobar {
extern "C" int main() {
puts("Hello world!");
}
}
- Compiler can't decide which is correct, both are rejected by gcc:
extern extern "C" extern "C++" int x; // accepted by clang (with warning)
extern "C" extern "C++" extern int x; // accepted by cland (with warning) and msvc (no warning)
GCC is correct. The second is more correct due to linkage-specifications, but, it's disallowed to
specify a storage class in a linkage-specificaiton
https://eel.is/c++draft/dcl.link#8.sentence-2.
- Double
[[gnu::constructor]]'s are ignored but they are still allowed onmainso hello world prints twice here.
[[gnu::constructor]] [[gnu::constructor]] int main() {
puts("Hello, World!");
}
C
- Source code of the very first C compiler.
- An empty struct is UB in C. Standard quote: 6.7.2.1.8 (C11-C23).
- A significant subset of possible identifiers are reserved in C. These include identifiers which
begin with
isorto,str, ormemfollowed by a lowercase letter in the global scope. It's undefined to declare/define a one of these reserved identifiers in the global scope. So, the following program may 1) print 1, 2) wipe your hard drive, 3) summon cthulhu, 4) other. All are behaviors are equally correct.
#include <stdio.h>
int iseven(int n) {
return n % 2 == 0;
}
int main() {
printf("%d", iseven(2));
}
- Expressions in parameter declarations are evaluated by gcc/clang. Due to sequencing this prints number 1-10:
#include <stdio.h>
int first = 0;
int main();
int main(int a, char *b[(first++ > 8) ? 1 : main()]) {
printf("%d\n", first--);
}
- Similarly this is a valid "hello world" program in C
int main(int, char*[puts("Hello World")]) {}
autois a keyword in C. Not to be confused with C++auto, Cautodoes absolutely nothing.extern const void x;is valid a valid declaration in C for the same reasonextern struct S s;is valid -voidis an incomplete type- This is not valid in C++ because incomplete types in general are not allowed in extern declarations, incomplete class types are specifically explicitly permitted in [dcl.stc]/7
- The following is valid C:
signed _Noreturn const long volatile long static _Atomic inline f(void);
Bugs and Implementation Quirks
- gcc allows completely empty case labels (C only):
switch(x) { case 1: }
- gcc allows labels to be applied to declarations
switch(x) { default: int y; }
switch(x) { default:; int y; } // must be this in clang
- This compiles without error in TCC
static inline int foo(void) {
[[[[[[[[{{(}));
}
int main(void) {
return _Generic(1, int:0, float:((}}]]]);
}
Talks
Some talks about C++ oddities: