seastar
seastar copied to clipboard
Possible bug(s) in the implementation of `sstring`
The following code is UB according to cppreference (https://en.cppreference.com/w/cpp/language/union):
bool is_internal() const noexcept {
return u.internal.size >= 0;
}
there is no guarantee that u.internal
is the most recently written member; cppreference says:
It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.
The question is, does clang or gcc (or both) guarantee that? And if we depend on compiler extension, we should explicitly say it in the code.
Another place with possibly incorrect code is when we create an sstring
with SSO:
basic_sstring(const char_type* x, size_t size) {
if (size_type(size) != size) {
internal::throw_sstring_overflow();
}
if (size + padding() <= sizeof(u.internal.str)) {
std::copy(x, x + size, u.internal.str);
if (NulTerminate) {
u.internal.str[size] = '\0';
}
u.internal.size = size;
note that u.internal
is never explicitly initialized; it's simply assumed that u.internal
is ready to use, and its fields are written to.
cppreference says:
If members of a union are classes with user-defined constructors and destructors, to switch the active member, explicit destructor and placement new are generally needed
there is no placement new done here.
I'm also not sure if the non-SSO case is correct. We're relying on the default construction of the union here, but does it even guarantee that either case is accessible by default?
I think the rules for plain-old-data unions are more relaxed.
"If two union members are standard-layout types, it's well-defined to examine their common subsequence on any compiler." I don't know if it counts as a subsequence since size is at the end. We might move size to the start.
We might move size to the start.
Unfortunately the size types might be different; basic_sstring
is templated by the size type used for non-ssod strings.
If the size types were the same, we could just move size out of the union and use it to disambiguate.
Do we actually save anything by using int8_t
for the size type for the SSO case?
Maybe we could just fix the size type to be size_t
and use it for both cases.
If size is moved out of the union, then the small-string case fits smaller strings (by 7 chars).
Yeah, assuming we don't adjust max_size
. Right now the default max_size
(in definition of sstring
) is 15
. How harmful would increasing that (to e.g. 22
) be?
Very harmful. Right now sstring takes 16 bytes. We don't want to bloat it.
The question is, does clang or gcc (or both) guarantee that?
GCC supports it explicitly, see: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning Clang supports it de facto. It is perfectly legal in C, so every sane compiler also makes it legal in C++, no matter what the C++ standard says.
note that u.internal is never explicitly initialized; it's simply assumed that u.internal is ready to use, and its fields are written to.
By "ready to use" you mean that its lifetime has started.
Lifetimes of union members are not very well defined in C++17, but C++20 amends that (edit: retroactively, I think) with the notion of "implicit lifetime-types" and "implicit object creation". There is a special rule saying that assignment involving a member a union member on the left side ends the lifetime of the previous active member (if there is one), implicitly creates the designated member and begins its lifetime.
Other major sources of implicit objects are char
arrays, malloc
and memcpy
.
I don't know what happens if you memcpy something to an inactive union member. You could argue that it ends the lifetime of the previous active member (via storage reuse) and implicitly creates the inactive member and makes it active. You could also argue that [basic.life/1] explicitly prevents activation of union members through ways other than initialization, assignment and copy constructors. It's too hard to interpret.
I'm also not sure if the non-SSO case is correct. We're relying on the default construction of the union here, but does it even guarantee that either case is accessible by default?
"Accessibility" isn't a thing. There are only lifetimes, which can be manipulated by assignments through union members (and possibly memcpy
).
cppreference says:
If members of a union are classes with user-defined constructors and destructors, to switch the active member, explicit destructor and placement new are generally needed
there is no placement new done here.
There are also no user-defined constructors and destructors here, so this note isn't helpful.
"If two union members are standard-layout types, it's well-defined to examine their common subsequence on any compiler." I don't know if it counts as a subsequence since size is at the end.
cppreference uses common subsequence
as a shortcut for the standard's common initial sequence
here. So it doesn't apply in this case.
The standard is not sane here, but if you truly want to be compliant, get rid of the union and manipulate the representation through bit_cast
, memcpy
and char*
. Instead of:
union contents {
struct external_type {
char_type* str;
Size size;
int8_t pad;
} external;
struct internal_type {
char_type str[max_size];
int8_t size;
} internal;
} u;
bool a() {
return u.internal.size >= 0;
}
void b() {
u.internal.size = -1;
}
you should do e.g.:
struct internal_type {
char_type str[max_size];
int8_t size;
} internal;
struct external_type {
char_type* str;
Size size;
int8_t pad[sizeof(internal_type) - sizeof(Size) - sizeof(char_type*)];
};
struct contents {
char v[sizeof(internal_type)];
} u;
bool a() {
return std::bit_cast<internal_type>(u).size >= 0;
}
void b() {
auto tmp = std::bit_cast<internal_type>(u);
tmp.size = -1;
u = std::bit_cast<contents>(tmp);
}