Unreachable (useless) code in bytesio.c:resize_buffer()
Bug report
Bug description:
https://github.com/NGRsoftlab/cpython/blob/main/Modules/_io/bytesio.c#L158
size_t alloc = PyBytes_GET_SIZE(self->buf);
/* skipped for short */
if (alloc > ((size_t)-1) / sizeof(char))
goto overflow;
This code is useless and goto is unreachable, because of false condition:
- 'alloc' has a type 'size_t' with minimum value '0' and a maximum value of size_t ('18446744073709551615' on x86_64)
- ((size_t)-1) is a maximum value of size_t ('18446744073709551615' on x86_64)
- size_t is built-in type for C
- sizeof(char) is always 1 in C
Found by Linux Verification Center (portal.linuxtesting.ru) with SVACE.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
- gh-117069
How about on other platforms and processors?
They should have correct C implementation with sizeof(char) always 1. Then you will compare one size_t with another size_t, which is always max size_t. So what's the point to do so?
Or i'm getting something wrong?
My point is that any analysis of this can't just be concerned with Linux x86_64. It has to be concerned with all supported platforms. Is this code a noop on all supported platforms? I haven't had time to think about it or analyze it. The original comment just focuses on Linux x86_64, so that caught my eye.
The report should be correct for the C language in general. The x86-64 references are just illustrations. Removing those references, I would explain it like this.
- sizeof(char) is always 1, regardless of platform
- Therefore (size_t)-1 / sizeof(char) is always equivalent to (size_t)-1
- Since size_t is an unsigned type, (size_t)-1 is always the maximum value of size_t, i.e. SIZE_MAX
- Since alloc is also size_t, the condition alloc > SIZE_MAX is never true
I confirm that this code is indeed useless. It was here from the beginning, so it is not the result of some incremental changes (unless it is a modified copy of the old StringIO implementation). In any case, it can be safely removed.
We are wary of purely cosmetic changes. As the example of #117071 showed, such changes are not always correct. But in this case, it's really dead code, and removing it makes the code cleaner.
Thank you for your contribution @NGRsoftlab.