carbon-lang icon indicating copy to clipboard operation
carbon-lang copied to clipboard

C++ interop: Add support for C++ Primitive Types

Open ivanaivanovska opened this issue 7 months ago • 7 comments
trafficstars

Use Carbon: C++ Interop - Primitive Types doc as a reference for the type mappings.

  • Signed integer types
Carbon type C++ type Status PRs
i8 signed char / int8_t TODO
i16 short / int16_t DONE https://github.com/carbon-language/carbon-lang/pull/5393
i32 int / int32_t DONE https://github.com/carbon-language/carbon-lang/pull/5197; https://github.com/carbon-language/carbon-lang/pull/5392
Core.Cpp.long long TODO
Core.Cpp.long_long long long TODO
i64 int64_t TODO
  • Unsigned integer types
Carbon type C++ type Status
u8 unsigned char / uint8_t TODO
u16 unsigned short / uint16_t TODO
u32 unsigned int / uint32_t TODO
Core.Cpp.unsigned_long unsigned long TODO
Core.Cpp.unsigned_long_long unsigned long long TODO
u64 uint64_t TODO
  • Floating-point types
Carbon type C++ type Status
f32 float TODO
f64 double TODO
TBD  long double TODO


ivanaivanovska avatar Apr 08 '25 12:04 ivanaivanovska

Note, for now maybe just i8/u8 for char types. We don't have a byte type yet, and we may want a char type.

For long, I'll also suggest forcing i64/u64 for now. We don't have Windows testing yet, but we need a clearer decision about how to handle cross-platform. It's possible we'll actually want a platform-dependent type for long ("Alternative: Provide variable size types" in the doc), and instead encourage int64_t for platform-independent use.

jonmeow avatar Apr 08 '25 17:04 jonmeow

For integer types, discussion with @chandlerc and @zygoloid seems to have leaned (for now, at least):

  • i8 == int8_t, which should be the same as signed char
    • And so on for i16, int16_t/short, and i32/int32_t/int
  • i64 == int64_t, which should be either long or long long, platform-dependent
  • There will be some Carbon name for both long and long long, although that name isn't clear (e.g., Core.Cpp.long and Core.Cpp.long_long, Core.Cpp.unsigned_long_long as a possible choice).
    • One of these will be a platform-dependent alias to i64. The other will be either 32-bit or 64-bit, and use an appropriate int type that is equivalent (but different) from i32/i64.
  • Corresponding for unsigned types

When transforming Carbon types to C++, always use this mapping. When transforming C++ types to Carbon, fit into these buckets as much as possible.

jonmeow avatar Apr 08 '25 21:04 jonmeow

Had some further discussion with @chandlerc about this, largely reaffirming the approach that @jonmeow described previously. Some new observations:

  • We should make sure that the platform-dependent type that is not an alias for iN is nonetheless compatible with the corresponding iN type, for example by defining the non-alias type as an extending adapter of the alias type. In particular, this would allow us to type-pun between them and for example cast from Core.Cpp.long_long* to i64* even when long_long is not an alias for i64.
  • Once we reach the point of using C++ overload resolution for function calls in interop, we can consider allowing forming an implicit conversion sequence from (eg) one of the two 64-bit integer types to the other one.
  • We will probably want to add a new type to Carbon at some point that is a machine-word-sized integer type (Core.Int(N) where N is pointer-width). We should check whether that type will always be the same type as ptrdiff_t / ssize_t under this model. We suspect it will, but there might be surprises lurking here.

zygoloid avatar Apr 09 '25 00:04 zygoloid

Thanks a lot for your input! @jonmeow @zygoloid I started drafting a new design doc reflecting on these discussions in: Carbon: C++ Interop - Primitive Types. It’s still a WIP, but any early feedback is welcome. Thanks.

ivanaivanovska avatar Apr 10 '25 13:04 ivanaivanovska

Hey, I would like to work on this issue.

rahilsaini-git avatar Apr 19 '25 17:04 rahilsaini-git

The primitive types mapping proposal (https://github.com/carbon-language/carbon-lang/pull/5448) is now open for review.

ivanaivanovska avatar May 26 '25 08:05 ivanaivanovska

Hey, I would like to work on this issue.

Thanks @rahiladmin for your interest in participating to this issue.

While the proposal (https://github.com/carbon-language/carbon-lang/pull/5448) is still in review, the mapping of signed char/int8_t -> i8 can be implemented next and you could take that if you’re still interested. You could use as an example the implementation for short / int16_t -> i16 (PR https://github.com/carbon-language/carbon-lang/pull/5393). If you work on this, please add myself and @bricknerb as reviewers and notify us of your work. Thanks!

ivanaivanovska avatar May 26 '25 08:05 ivanaivanovska

Just want to explicitly note, Crubit (comprehensive C++/Rust interop with similar goals to Carbon) has corresponding goals with the type mapping, and I think we're reaching towards a substantially similar mapping with similar justifications, described at a high level here: https://crates.io/crates/ffi_11 -- and at a low level here: https://docs.rs/ffi_11/latest/ffi_11/

Things that are the same:

Since Rust (and Carbon) have sized fundamental types, and C++ has these size-unknown fundamental types, we both have a set of type aliases for C++ fundamental types. The type mapping is characterized by the two constraints:

  1. it points to a fundamental type if and only if the corresponding sized alias in C++ points to the corresponding C++ fundamental type. For example, ffi_11::c_long and Cpp.long are the language-native i64 if and only if, in C++, int64_t is an alias for long.

  2. if two types are different in C++, they are different in Rust, and preferably vice versa.

Things that are different:

only-nameable guarantee

A C++ integer builtin type that is not the same as intN_t or uintN_t for any N, will be nameable in Carbon only as Cpp.builtin_type.

Rust has fundamental types that are not just the sized iN and uN types, such as usize, so we extend this slightly. So, for example, if C++ uint64_t is unsigned long, and C++ size_t is unsigned long long, then it would make perfect sense to have c_ulong be u64 and c_ulonglong be the Rust usize type. It is what people will expect, I think.

void*

I believe this proposal defers making a decision about void* for reasons of bijectivity. If two types are the same in C++, they should ideally be the same in Carbon, or in Rust. So it would make sense that void pointers become pointers to unit, yet at the same time it is surprising, so this naturally gets put off.

At least in Rust, we have no choice: void* cannot become, for example, &mut (), because of provenance rules: () is a zero-sized type, and grants no permission to write or read any bytes. You cannot cast a smaller type to a larger type and perform writes/reads without UB. So the void type used in void* must be an "unsized" extern type with no provenance-known size, at least if it is to be used in provenance-aware places such as references.

However, if it is an unsized type, it cannot be returned by value. So at the same time, void return values cannot be unsized, and must be a sized (preferably zero-sized) type. The unit type is conventional in Rust.

So our hand is forced: the void in void Foo(); cannot be the same as the void in void* Foo();. The former should probably be (), the latter must be an "unsized" extern type.

I suspect something similar could be true of Carbon's language rules, but if not, it might be worth matching up with related projects anyway.

missing types

Crubit lucks out here: Rust already knows what it wants for character types, and the C++ char type isn't it, so we just define new types for char and the charN_t types. Similarly, nullptr_t is easy enough to implement, and it only exists for overload selection. And we don't have std::byte yet, but I'd imagine it's just an empty enum.

We did not try to implement any of the extended float things, or bit-precise integers for that matter, or long double.

It might be worth noting explicitly here: you'd be tempted to map char32_t to Rust's char type, but they have different ranges of valid values, so this doesn't work. (It's UB to represent an invalid scalar value in Rust, but not UB in C++.) Something similar could happen to Carbon's char type.

ssbr avatar Jul 16 '25 22:07 ssbr

So our hand is forced: the void in void Foo(); cannot be the same as the void in void* Foo();. The former should probably be (), the latter must be an "unsized" extern type.

I suspect something similar could be true of Carbon's language rules, but if not, it might be worth matching up with related projects anyway.

We discussed this week having a Cpp.void type (which is always an incomplete type) since we will need that to express void template parameters to C++, at which point it made sense to use that for void* as well. By being incomplete, it doesn't have a defined size and you can't deref it.

danakj avatar Jul 17 '25 13:07 danakj