CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

ES.107 (and others) never mentions `std::ssize()`

Open jarzec opened this issue 2 months ago • 3 comments

At the time of writing nowhere within the Guidelines is there a single mention of the std::ssize() function added in C++20. I might be mistaken but my understanding is that this addition to the standard library solves many of the issues with the standard containers using unsigned subscripts, or specifically the unsigned return type of the .size() method.

I have mentioned the rule ES.107 in the title as I would see the example code there a perfect place to use std::ssize(). The Enforcement section states that enforcement of ES.107 is tricky and gives the example of comparison of gsl::index with the results of .size() and sizeof. While not much can be done about sizeof the comparison issues of gsl::index with .size() are entirely (?) removed by the use of std::ssize(). Maybe sizeof is waiting for a mechanism analogous to std::ssize() 😉?

The reason I am raising this question is that I have been suggesting the use of std::ssize() in my code reviews for a while but I could not support this with anything in the Guidelines. It was relatively easy to justify, though, as it was mainly protecting against the type of issues that can arise with my_container.size() - 1 as opposed to std::ssize(my_container) - 1.

I believe that multiple places within the Guidelines could employ std::ssize() - many (but not all) of those that currently use .size(). Possibly even a dedicated rule for accessing sizes of standard containers with std::ssize(), if C++20 is available, could be added?

jarzec avatar Oct 17 '25 21:10 jarzec

It's also worth pointing out that the "good" example:

for (gsl::index i = 0; i < vec.size(); i += 2)             // ok

is in fact not ok. Since .size() is unsigned, this creates a signed-vs-unsigned comparison, implicitly promoting the signed value to unsigned, making it no better than

for (vector<int>::size_type i = 0; i < vec.size(); i += 2)

or

for (std::size_t i = 0; i < vec.size(); i += 2)

Personally, though, I have issues with signed indices/subscripts and sizes. For one, it puts an extra burden on range checks, so rather than simply checking if the value is too large some_assert(i < vec.size());, you need to also check if it's too small some_assert(i >= 0); some_assert(i < std::ssize(vec));. At best, you can hope the compiler recognizes that it can turn that into a single unsigned less-than check (if the two checks are in proximity to each other), but then why not just write what you mean instead of hoping the compiler recognizes the simpler equivalent?

It's also my experience that when you have signed mismatches like this, code tends to do just a blind static_cast or equivalent to get the expected signedness, creating its own problems if the value doesn't fit (even in implementations of std::ssize, I see it just static_cast the result of .size() to the signed type, or implementations of std::views::counted(...) for contiguous iterators, casting its count parameter to an unsigned type, ignoring any potential overflow). The GSL has gsl::narrow to ensure the input properly maps to the output and throws a narrowing exception if it doesn't, but this is much harder to optimize away as redundant since it takes a different well-defined code path prior to the range check itself, so the generated code has to account for both possibilities.

Further, having signed indices and sizes introduces additional questions the developer needs to keep in mind, both on the caller and callee side of things. What does a negative size mean? Rather than being able to work with any unsigned size value as a valid max limit, is that an extra error condition you need to watch for and report, or is it silently treated as a 0 length, or can it be safely assumed not to be negative? And what does a negative index mean? Is it always an error/oob, is it meant to be the bit-wise unsigned index (as if by static_cast/bit_cast), is it an offset from the end (Python strings are fun like this; str[-1] is the last character in the string, str[-2] is the second-to-last character, etc), should the absolute value be used (as if the array mirrors around 0)?

Signed indices have the additional problem that if an overflow condition occurs, you get UB, whereas unsigned indices guarantee wrapping. So if you run into an overflow condition, unsigned guarantees consistent behavior making it easier to repeat and track down, where signed could do a number of different things, including being backed by a larger integer which hides the problem, or wrapping from max to min, which can change with different optimization flags or when changing/upgrading the compiler.

I much prefer keeping sizes and indices as unsigned as much as possible, as it reduces the mental burden of extra states if they end up negative, under/overflow has more consistent behavior (making bugs easier to track down), and it's easier for the compiler to work with and optimize. Given that the majority of the STL (at least what I use most often of it) expects unsigned sizes and indices (be it .size(), .resize(...), .reserve(...), operator[], and built-in language stuff like sizeof, offsetof, etc), the only time I really use signed sizes and indices is basically when I'm forced to by a predefined API, such as with istream::read and ostream::write or when offsetting the begin() iterator, and it's always something that raises caution flags.

kcat avatar Oct 18 '25 18:10 kcat

Personally, though, I have issues with signed indices/subscripts and sizes. For one, it puts an extra burden on range checks, so rather than simply checking if the value is too large some_assert(i < vec.size());, you need to also check if it's too small some_assert(i >= 0); some_assert(i < std::ssize(vec));. At best, you can hope the compiler recognizes that it can turn that into a single unsigned less-than check (if the two checks are in proximity to each other), but then why not just write what you mean instead of hoping the compiler recognizes the simpler equivalent?

This goes back in history to when methods would report an error using a negative value - e.g you do a search and it reports -1 as the result for not found, thus signed indexes were required since an unsigned int(-1) could be different values depending on the CPU architecture/byte/word sizes.

BenjamenMeyer avatar Oct 21 '25 14:10 BenjamenMeyer

My understanding of the general take on the signed/unsigned question within the Guidelines is that arithmetic should be done in signed types, unless modular arithmetic is explicitly required (ES.102). To one of the points raised above ES.106 asks not to avoid negative values by using unsigned, which (I believe) protects against underflows in -- iterations and in fact effectively encourages the extra burden in rage checks. It is in the light of theses (and other similar) guidelines that I understand ES.107. It stipulates the use of the (signed) gsl::index for indexing, which almost always involves arithmetic. In its note the guideline even states that: "no perfect and fully compatible solution is possible (unless and until the standard-library containers change to use signed subscripts someday in the future". I see this issue merely as a suggestion to improve consistency throughout the document.

jarzec avatar Oct 21 '25 22:10 jarzec