carbon-lang
carbon-lang copied to clipboard
C++ interop: Add support for C++ Primitive Types
Use Carbon: C++ Interop - Primitive Types doc as a reference for the type mappings.
- Signed integer types
| Carbon type | C++ type | Status | PRs |
|---|---|---|---|
| i8 | signed char / int8_t | TODO | |
| i16 | short / int16_t | DONE | https://github.com/carbon-language/carbon-lang/pull/5393 |
| i32 | int / int32_t | DONE | https://github.com/carbon-language/carbon-lang/pull/5197; https://github.com/carbon-language/carbon-lang/pull/5392 |
| Core.Cpp.long | long | TODO | |
| Core.Cpp.long_long | long long | TODO | |
| i64 | int64_t | TODO |
- Unsigned integer types
| Carbon type | C++ type | Status |
|---|---|---|
| u8 | unsigned char / uint8_t | TODO |
| u16 | unsigned short / uint16_t | TODO |
| u32 | unsigned int / uint32_t | TODO |
| Core.Cpp.unsigned_long | unsigned long | TODO |
| Core.Cpp.unsigned_long_long | unsigned long long | TODO |
| u64 | uint64_t | TODO |
- Floating-point types
| Carbon type | C++ type | Status |
|---|---|---|
| f32 | float | TODO |
| f64 | double | TODO |
| TBD | long double | TODO |
Note, for now maybe just i8/u8 for char types. We don't have a byte type yet, and we may want a char type.
For long, I'll also suggest forcing i64/u64 for now. We don't have Windows testing yet, but we need a clearer decision about how to handle cross-platform. It's possible we'll actually want a platform-dependent type for long ("Alternative: Provide variable size types" in the doc), and instead encourage int64_t for platform-independent use.
For integer types, discussion with @chandlerc and @zygoloid seems to have leaned (for now, at least):
i8==int8_t, which should be the same assigned char- And so on for
i16,int16_t/short, andi32/int32_t/int
- And so on for
i64==int64_t, which should be eitherlongorlong long, platform-dependent- There will be some Carbon name for both
longandlong long, although that name isn't clear (e.g.,Core.Cpp.longandCore.Cpp.long_long,Core.Cpp.unsigned_long_longas a possible choice).- One of these will be a platform-dependent alias to
i64. The other will be either 32-bit or 64-bit, and use an appropriate int type that is equivalent (but different) fromi32/i64.
- One of these will be a platform-dependent alias to
- Corresponding for unsigned types
When transforming Carbon types to C++, always use this mapping. When transforming C++ types to Carbon, fit into these buckets as much as possible.
Had some further discussion with @chandlerc about this, largely reaffirming the approach that @jonmeow described previously. Some new observations:
- We should make sure that the platform-dependent type that is not an alias for
iNis nonetheless compatible with the correspondingiNtype, for example by defining the non-alias type as an extending adapter of the alias type. In particular, this would allow us to type-pun between them and for example cast fromCore.Cpp.long_long*toi64*even whenlong_longis not an alias fori64. - Once we reach the point of using C++ overload resolution for function calls in interop, we can consider allowing forming an implicit conversion sequence from (eg) one of the two 64-bit integer types to the other one.
- We will probably want to add a new type to Carbon at some point that is a machine-word-sized integer type (
Core.Int(N)whereNis pointer-width). We should check whether that type will always be the same type asptrdiff_t/ssize_tunder this model. We suspect it will, but there might be surprises lurking here.
Thanks a lot for your input! @jonmeow @zygoloid I started drafting a new design doc reflecting on these discussions in: Carbon: C++ Interop - Primitive Types. It’s still a WIP, but any early feedback is welcome. Thanks.
Hey, I would like to work on this issue.
The primitive types mapping proposal (https://github.com/carbon-language/carbon-lang/pull/5448) is now open for review.
Hey, I would like to work on this issue.
Thanks @rahiladmin for your interest in participating to this issue.
While the proposal (https://github.com/carbon-language/carbon-lang/pull/5448) is still in review, the mapping of signed char/int8_t -> i8 can be implemented next and you could take that if you’re still interested. You could use as an example the implementation for short / int16_t -> i16 (PR https://github.com/carbon-language/carbon-lang/pull/5393). If you work on this, please add myself and @bricknerb as reviewers and notify us of your work. Thanks!
Just want to explicitly note, Crubit (comprehensive C++/Rust interop with similar goals to Carbon) has corresponding goals with the type mapping, and I think we're reaching towards a substantially similar mapping with similar justifications, described at a high level here: https://crates.io/crates/ffi_11 -- and at a low level here: https://docs.rs/ffi_11/latest/ffi_11/
Things that are the same:
Since Rust (and Carbon) have sized fundamental types, and C++ has these size-unknown fundamental types, we both have a set of type aliases for C++ fundamental types. The type mapping is characterized by the two constraints:
-
it points to a fundamental type if and only if the corresponding sized alias in C++ points to the corresponding C++ fundamental type. For example,
ffi_11::c_longandCpp.longare the language-nativei64if and only if, in C++,int64_tis an alias forlong. -
if two types are different in C++, they are different in Rust, and preferably vice versa.
Things that are different:
only-nameable guarantee
A C++ integer builtin type that is not the same as
intN_toruintN_tfor any N, will be nameable in Carbon only asCpp.builtin_type.
Rust has fundamental types that are not just the sized iN and uN types, such as usize, so we extend this slightly. So, for example, if C++ uint64_t is unsigned long, and C++ size_t is unsigned long long, then it would make perfect sense to have c_ulong be u64 and c_ulonglong be the Rust usize type. It is what people will expect, I think.
void*
I believe this proposal defers making a decision about void* for reasons of bijectivity. If two types are the same in C++, they should ideally be the same in Carbon, or in Rust. So it would make sense that void pointers become pointers to unit, yet at the same time it is surprising, so this naturally gets put off.
At least in Rust, we have no choice: void* cannot become, for example, &mut (), because of provenance rules: () is a zero-sized type, and grants no permission to write or read any bytes. You cannot cast a smaller type to a larger type and perform writes/reads without UB. So the void type used in void* must be an "unsized" extern type with no provenance-known size, at least if it is to be used in provenance-aware places such as references.
However, if it is an unsized type, it cannot be returned by value. So at the same time, void return values cannot be unsized, and must be a sized (preferably zero-sized) type. The unit type is conventional in Rust.
So our hand is forced: the void in void Foo(); cannot be the same as the void in void* Foo();. The former should probably be (), the latter must be an "unsized" extern type.
I suspect something similar could be true of Carbon's language rules, but if not, it might be worth matching up with related projects anyway.
missing types
Crubit lucks out here: Rust already knows what it wants for character types, and the C++ char type isn't it, so we just define new types for char and the charN_t types. Similarly, nullptr_t is easy enough to implement, and it only exists for overload selection. And we don't have std::byte yet, but I'd imagine it's just an empty enum.
We did not try to implement any of the extended float things, or bit-precise integers for that matter, or long double.
It might be worth noting explicitly here: you'd be tempted to map char32_t to Rust's char type, but they have different ranges of valid values, so this doesn't work. (It's UB to represent an invalid scalar value in Rust, but not UB in C++.) Something similar could happen to Carbon's char type.
So our hand is forced: the
voidinvoid Foo();cannot be the same as thevoidinvoid* Foo();. The former should probably be(), the latter must be an "unsized" extern type.I suspect something similar could be true of Carbon's language rules, but if not, it might be worth matching up with related projects anyway.
We discussed this week having a Cpp.void type (which is always an incomplete type) since we will need that to express void template parameters to C++, at which point it made sense to use that for void* as well. By being incomplete, it doesn't have a defined size and you can't deref it.