CppCoreGuidelines icon indicating copy to clipboard operation
CppCoreGuidelines copied to clipboard

Favor bit_cast over reinterpret_cast

Open ghost opened this issue 5 years ago • 22 comments

Now that std::bit_cast is coming in C++20, the valid use cases for reinterpret_cast are slim to none in the majority of applications. I believe there should be a rule heavily discouraging reinterpret_cast in favor of std::bit_cast or another named cast.

Often times switching from reinterpret_cast to std::bit_cast would be the difference between invoking undefined behavior or not, due to type aliasing rules.

ghost avatar Sep 18 '19 13:09 ghost

There are at least rules discouraging casts: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es48-avoid-casts, https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast. I believe your suggestion would fit well into the type-safety profile.

beinhaerter avatar Sep 18 '19 14:09 beinhaerter

Editors call: Even though it's a C++20 features and the Guidelines don't yet support/require C++20, enough people are interested in this that we should say something. Assigning to @GabrielDosReis. Thanks!

hsutter avatar Oct 01 '20 18:10 hsutter

Just for my understanding, do I understand correctly that this pull request favors bit_cast over reinterpret_cast in any possible case? Including the cases depicted by three code examples below here?

void f1(std::uintptr_t address) {
  auto ptr1 = reinterpret_cast<void*>(address);
  auto ptr2 = std::bit_cast<void*>(address); // Favorable?
}

void f2(std::istream& input, std::vector<std::byte>& bytes) {
  input.read(reinterpret_cast<char*>(bytes.data()), bytes.size());
  input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?
}

void f3(int i) {
  auto byte_ptr1 = reinterpret_cast<std::byte*>(&i);
  auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?
}

I'm very interested to hear if that is indeed going to be the official guideline, with respect to reinterpret_cast versus bit_cast

N-Dekker avatar Apr 20 '22 16:04 N-Dekker

auto ptr2 = std::bit_cast<void*>(address); // Favorable?

This might be il-formed because sizeof(void*) == sizeof(uintptr_t) is not necessarily true, and even if it compiles, it might give a different answer. The reinterpret_cast does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.

input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?

It's not guaranteed that byte* and char* have the same representation, so this might be ill-formed too.

auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?

Ditto for int* and byte*.

jwakely avatar Apr 20 '22 16:04 jwakely

auto ptr2 = std::bit_cast<void*>(address); // Favorable?

This might be il-formed because sizeof(void*) == sizeof(uintptr_t) is not necessarily true, and even if it compiles, it might give a different answer. The reinterpret_cast does an implementation-defined conversion and does not necessarily produce a pointer value that is bitwise identical to the integer input.

Thanks @jwakely This particular void* example is actually based on an attempt of mine to replace some C-style casts like the one from https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex doing:

hThread = (HANDLE)_beginthreadex( NULL, 0, &SecondThreadFunc, NULL, 0, &threadID );

_beginthreadex returns a uintptr_t, and HANDLE is just a type alias of void*. Would you use reinterpret_cast<HANDLE>, rather than std::bit_cast<HANDLE>, in such a case?

input.read(std::bit_cast<char*>(bytes.data()), bytes.size()); // Favorable?

It's not guaranteed that byte* and char* have the same representation, so this might be ill-formed too.

Would it be any better when doing reinterpret_cast<char*>(bytes.data()) instead?

auto byte_ptr2 = std::bit_cast<std::byte*>(&i); // Favorable?

Ditto for int* and byte*.

So again, would it be better to use reinterpret_cast here?

N-Dekker avatar Apr 20 '22 16:04 N-Dekker

Your reinterpret cast examples are ok, I would not replace any of them with std::bit_cast.

jwakely avatar Apr 20 '22 17:04 jwakely

@jwakely Thanks again, Jonathan. So do you have a suggestion how the core guideline on reinterpret_cast versus bit_cast should be like? Apparently it isn't simply "Favor bit_cast over reinterpret_cast"!

N-Dekker avatar Apr 20 '22 17:04 N-Dekker

Correct me if I'm wrong, but I thought using bit_cast as a 1:1 replacement for reinterpret cast ist almost always wrong. Afaik the idea is not use bit_cast to cast the pointer as you would with `reinterpret_cast (which is often UB, but usually works as long as alignment is correct), but to cast the value the pointer points to. E.g.

#include <iostream>
#include <bit>

struct Foo {
    std::uint32_t low;
    std::uint32_t high;       
};

int main()
{
    alginas(alignof(std::uint64_t)) Foo f{ 0x1, 0x2};

    //pretty sure this is UB, but seems to be common practice
    auto* p = reinterpret_cast<uint64_t*>(&f);
    std::cout << *p << std::endl;  

    auto v = std::bit_cast<uint64_t>(f);
    std::cout << v << std::endl;
}

MikeGitb avatar Apr 20 '22 18:04 MikeGitb

Right. But that doesn't make much sense for Niels' examples, where he wants to read the char values making up the object representation of some other objects. You could read each byte as a char using bit_cast but that's not very useful when you want to process N bytes at a time.

jwakely avatar Apr 20 '22 18:04 jwakely

Right. But that doesn't make much sense for Niels' examples,

Absolutely. My point is that bit_cast isn't a replacement for reinterpret_cast any more than std::unique_ptr is a replacement for raw pointers. It can be used to avoid a specific subset of usecases for reinterpret_cast. And any Guideline should explain that Otherwise people get confused just like we see here or the people that initially thought they should replace all their raw pointers with either unique or shared pointer.

MikeGitb avatar Apr 21 '22 07:04 MikeGitb

On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?

MikeGitb avatar Apr 21 '22 07:04 MikeGitb

@MikeGitb This might be the paper you are thinking of: https://github.com/cplusplus/papers/issues/592

sweemer avatar Apr 21 '22 12:04 sweemer

@sweemer : Thanks, but I don't think that is it.

From crossreading that paper talks about viewing the storage of an Foo object as a sequence of char/unsigned char/std::byte, which did and was supposed to work for ages, but the actual wording in the standard didn't permit it.

What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a Foo object and I'd like to access that buffer through a Foo* pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.

MikeGitb avatar Apr 22 '22 07:04 MikeGitb

What I'm talking aobut is the reverse case: I'm getting a pointer to a char/std::byte buffer that has been filled with bytes that contain a valid representation of a Foo object and I'd like to access that buffer through a Foo* pointer. However, as no actual c++ object has ever been created in that buffer this is UB. While this too works in practice (and has to work in order vor c- code to be compiled as c++ code), as long as alignment requirements are observed, I think there was a much bigger debate around whether that should be officielly blessed by the standard.

Given the Foo (an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).

frederick-vs-ja avatar Apr 25 '22 09:04 frederick-vs-ja

On another, but related topic: Is my understanding correct that recent changes to the standard (don't remember the name unfortunately and whether it was c++20, c++23 or a DR) now officially permit to cast a pointer to a range of raw bytes (e.g. received over the network or mmap) to a pointer to a c++ object and dereference this pointer?

I guess this usage sometimes requires std::launder after reinterpret_cast (see P0137R1) currently, although AFAIK no compiler thinks std::launder is in effect in this case.

frederick-vs-ja avatar Apr 25 '22 10:04 frederick-vs-ja

Given the Foo (an implicit-lifetime type) shown in your example (https://github.com/isocpp/CppCoreGuidelines/issues/1517#issuecomment-1104282681), there can be a Foo object created within that buffer when the buffer is created, due to implicit object creation (introduced as DR by P0593R6).

Thats the paper I had in mind. Thank you very much.

MikeGitb avatar Apr 25 '22 13:04 MikeGitb

@jwakely Can you please confirm that the line of code that does a reinterpret_cast in the following example has undefined behavior?

void f( uint32_t u ) {
  int i1 = *reinterpret_cast< int32_t * >(&u); // Undefined behavior, right?
  int i2 = std::bit_cast< int32_t>(u); // Proper bit_cast use case?
}

Assuming that a bitwise conversion is intended, I guess that would be a proper use case for bit_cast. Right? I just proposed this to the ITK library, pull request https://github.com/InsightSoftwareConsortium/ITK/pull/3402

N-Dekker avatar May 01 '22 12:05 N-Dekker

The reinterpret_cast itself is ok, but dereferencing the result of the cast is undefined.

The bit_cast version is ok, but redundant. For two's complement integers the implicit conversion from uint32_t to int32_t preserves the original bit pattern, and C++ now requires two's complement integers (and all known implementations already use them anyway).

jwakely avatar May 01 '22 17:05 jwakely

For two's complement integers the implicit conversion from uint32_t to int32_t preserves the original bit pattern, and C++ now requires two's complement integers

@jwakely Ah, thanks Jonathan. I see now that with p0907R2 - Signed Integers are Two’s Complement, in case of out of range, the result is no longer implementation-defined. That's also new to me.

So in this particular case (from uint32_t to int32_t), one might say "Favor bit_cast over reinterpret_cast", but then both implicit conversion and static_cast<int32_t> would still be more favorable than bit_cast. 🤔

With respect to my ITK pull request , I'm not entirely sure if all compilers supported by ITK already implement integer conversion exactly like that (without any disagreeing implementation-defined behavior), as ITK still supports C++14 compilation by GCC 5.1, LLVM Clang 3.4, and VS2017 (v141). So I think ITK might still use its own hand-written bit_cast replica for a while. But of course, that's beyond this CppCoreGuidelines issue.

N-Dekker avatar May 01 '22 21:05 N-Dekker

GCC, clang and msvc always implemented that rule on all supported architectures. That's why changing the standard was completely uncontroversial. It used to be implementation defined to allow for other representations, but no C++ compiler ever used anything except two's complement.

jwakely avatar May 02 '22 07:05 jwakely

@N-Dekker I think *reinterpret_cast< int32_t * >(&u) is well-defined according to [basic.lval]/(11.2):

a type that is the signed or unsigned type corresponding to the dynamic type of the object, or

But I also suggest you to use static_cast in this case.

frederick-vs-ja avatar May 02 '22 13:05 frederick-vs-ja

@N-Dekker @jwakely I agree with @frederick-vs-ja that the int32_t i1 = *reinterpret_cast< int32_t * >(&u); (I changed the int to int32_t to eliminate the assignment undefined behavior (caused by integer overflow) which is irrelevant to this topic.) behavior should be well-defined. However, if it were float i1 = *reinterpret_cast< float * >(&u);, then the behavior is undefined.

leimao avatar Dec 29 '22 23:12 leimao