encoding_c icon indicating copy to clipboard operation
encoding_c copied to clipboard

The define ENCODING_RS_ENCODING is unusual way of wrapping

Open dimztimz opened this issue 4 years ago • 3 comments

Noticed that this library does some tricks that are unidiomatic for C++. I'll post a minimal example.

// library.h
#ifdef __cplusplus
#define ENCODING_RS_ENCODING CppEncoding
class CppEncoding {
//no data members
public:
    CppEncoding() = delete; // this object can not be created by value
};
#else
#define ENCODING_RS_ENCODING  struct CEncoding
struct CEncoding;
#endif

ENCODING_RS_ENCODING * library_function(const char * name);

The effect is that if library_function() is called from C code it returns pointer to CEncoding, and from C++ code it returns pointer to CppEncoding. This is unusual and it gets very close to the land of undefined behaviour (breaking strict aliasing), although probably is not as long as you don't dereference those pointers.

The greater concern to me is that the default C++ wrapper exposes pointer semantics, and not value semantics. The way I would write the C++ wrapper is the usual way by wrapping the (pointer to) C-struct inside the C++ class.

// C part
struct CEncoding;
struct CEncoding * library_function(const char * name);

// C++ part
class CppEncoding {
    struct CEncoding * enc = nullptr;
public:
    CppEncoding() = default;
    CppEncoding(const char * name) { ... }
};

Why was this done the way it is? Maybe to avoid name clashes? Would you consider to update to the idiomatic C++?

dimztimz avatar Jun 05 '20 21:06 dimztimz

Yeah, this gets messy if the same app 1) uses it both from C and C++ and 2) it passes what's returned from Rust to C further from C to C++.

although probably is not as long as you don't dereference those pointers

The dereferences happen on the Rust side, so the dereferencing isn't a problem.

The greater concern to me is that the default C++ wrapper exposes pointer semantics, and not value semantics.

Exposing value semantics without actually wrapping a pointer would require passing size and alignment information from the Rust compiler to the C++ compiler, which would mean using a nightly-only Rust feature and parsing its output in the C++ build system.

Exposing value semantics in the way you suggest seems like making CppEncoding a special-purpose unique_ptr for CEncoding.

Why was this done the way it is?

It was done this way to allow the pointer to be a pointer to a namespaced thing in C++ in a C++ codebase 1) that doesn't care about also using the raw C API directly and 2) doesn't treat heap-allocated objects pointed to by unique pointer as unidiomatic and to allow the pointer to be directly a pointer to the underlying Rust object.

I'd like to understand more about your use case that ends up wishing to call the API both via the raw C API and the C++ wrapper.

hsivonen avatar Jun 06 '20 19:06 hsivonen

Exposing value semantics without actually wrapping a pointer would require passing size and alignment information from the Rust compiler to the C++ compiler, which would mean using a nightly-only Rust feature and parsing its output in the C++ build system.

I understand, but I don't require that.

Exposing value semantics in the way you suggest seems like making CppEncoding a special-purpose unique_ptr for CEncoding.

You may say that, it is a good point of view, but I would say that that CppEncoding is a proper C++ class that follows the RAII pattern, like any other class in the C++ standard library, e.g. std::string and std::vector. Such classes are expected by C++ developers. In my viewpoint I wouldn't mention unique_ptr. Instead I'm mentioning RAII because that exists before unique_ptr exists. Thus for me vector is a better example than unique_ptr.

Now, the struct/class Encoding does not need RAII because it doesn't need to be destroyed (from C/C++ side), but Encoder and Decoder need destroying. I'll use Decoder as my example.

To achieve RAII with the Decoder you jump through hoops and do confusing acrobatics. You have:

  1. Deleted constructor of Decoder
  2. Empty destructor of Decoder
  3. Overloaded operator delete where you do the destruction!!!
  4. Force the client to use the Decoder pointer via std::unique_ptr.

That way, you still can circumvent RAII by calling unique_ptr::release().

dimztimz avatar Jun 07 '20 18:06 dimztimz

Oops. Sorry, my previous reply was bogus. Let's try again. In my previous reply, I what thinking of Encoder and Decoder, which are returned by value in Rust and are inside a unique_ptr in C++.

The reason why Encoding is behind a plain pointer is that it is statically allocated and must not be copied or destroyed, but I still want it to act is if it was a C++ class.

For Decoder/Encoder, the key point is that unless the C++ build system obtains size and alignment information from the Rust compiler such that C++ code actually ends up moving Rust-initialized Decoder/Encoder by value, Decoder and Encoder need to be allocated on the heap on the Rust side in order not to require size/alignment information on the C++ side.

Once they are heap-allocated, wrapping them in C++ such that the wrappers would be smart pointers in disguise would be a lie. Letting these look like heap-allocated objects managed by std::unique_ptr is honest about the point of these being heap-allocated.

That way, you still can circumvent RAII by calling unique_ptr::release().

In C++, you can always circumvent things if you try hard enough.

hsivonen avatar Jun 08 '20 10:06 hsivonen