rust
rust copied to clipboard
Tracking Issue for the experimental `crabi` ABI
This is a tracking issue for the experimental crabi ABI; see https://github.com/rust-lang/rust/pull/105586 and https://github.com/rust-lang/compiler-team/issues/631.
The feature gate for the issue is #![feature(crabi)].
About tracking issues
Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
- [ ] Implement the experimental feature
- [ ] Write an RFC precisely specifying the ABI
- [ ] Adjust documentation (see instructions on rustc-dev-guide)
- [ ] Stabilization PR (see instructions on rustc-dev-guide)
Unresolved Questions
- Niches: should we support cases like
Option<bool>without a separate discriminant, or should we (for simplicity) always pass a separate discriminant? Likely the latter. However, what about things likeOption<&T>andOption<NonZeroU32>, for which Rust guarantees the representation ofNone? Those work with the C ABI, and they have to work with crABI, but can we make them work with crABI using the same encoding ofNone? - What subset of lifetimes can, and should, we support? We can't enforce them cross-language, but they may be useful as an advisory/documentation mechanism. Or we could leave them out entirely.
- To what extent should crABI make any attempt to specify things that can't be enforced, rather than ignoring semantics entirely and only specifying how types get passed?
- How can we make it easy to support data structures without having to do
translation from
repr(Rust)torepr(crabi)and have parallel structures? Can we make that less painful to express, and ideally mostly free at runtime?- Related: how can we handle tuples? Do we need a way to express
repr(crabi)tuples? How can we do that conveniently?
- Related: how can we handle tuples? Do we need a way to express
- Should we provide support for extensible enums, such that we don't assume the discriminant matches one of the known variants? Would doing so make using enums less ergonomic? Could we address that with language changes?
- For handling objects, could we avoid having to pass in-memory function pointers via a vtable, and instead reference specific symbols? This wouldn't work for generics, though. Can we do any better than a vtable?
- For ranges, should we provide a concrete range type or types, or should we defer that and handle ranges as opaque objects or traits?
- Do we get any value out of supporting
(), other than completeness? Passing()by value should just be ignored as if it weren't specified. Do we want people using pointers to(), and do those have any advantage over pointers to void? - Should we do anything special about
i128andu128, or should we just push for getting those supported correctly inextern "C"? - For generics, such as
Option<u64>orResult<u32, ConcreteError>or[u8; 16], does the rule "all generic parameters must be bound to concrete types in the function signature" suffice, or do we need a more complex rule than that? - Unwinding: The default
extern "crabi"should not support unwind, and most languages don't tend to have support for unwinding through C-ABI functions, but should we have acrabi-unwindvariant? Would doing so provide value?
Implementation history
- Feature gate: https://github.com/rust-lang/rust/pull/105586
During discussion, we arrived at the conclusion that the notion of a "C ABI" is somewhat of an existential question, because the C Standard does not define a C ABI, and currently our extern "C" does not even support all Standard C types, like long double and _Complex. And the C language can add new types, of course (it in fact did in C23, non-optionally, a new type that all compilers will now be expected to support in order to claim they support C23). Thus saying "superset of a C ABI" is slightly dubious. It may prove beneficial for this experiment to hammer down what we mean, anyways, when we say "C ABI", extern "C", and so on, because there is a definable meaning that we in-practice work with.
Until then, it is another unanswered question to address.
@workingjubilee I already modified the proposal to just say "The crABI support for Rust will be a strict superset of the C ABI support for Rust."; that avoids implying support for things like long double or u128 or _Complex that we don't currently support.
I think at the very least Option<&T> should be able to represent None as a null pointer. If this ABI will support dynamic linking to something like libc for more modern languages (and if supporting that use case is not already a goal, I think it should be), the performance hit of the extra data for the discriminant in something as simple as a pointer is simply not acceptable, as it would, much like libc, be used by many programs on nearly every system.
@LilyIsTrans That is already a guaranteed repr of Rust for sized types: https://doc.rust-lang.org/std/option/index.html#representation
@LilyIsTrans That is already a guaranteed repr of Rust for sized types: https://doc.rust-lang.org/std/option/index.html#representation
Yes, but it is not clear to me that that would imply it's necessarily going to represented like that in crabi, which is a FFI and therefore presumably not necessarily subject to the normal rules of repr(Rust). My point is that if I call a dynamically linked extern "crabi" function, Option<&T> should still be guaranteed to have the same representation as &T would in that function's interface.
As for lifetimes, I think we can encode the certainties into them: that if a variable is dropped during execution that it consumes the parameter which can be used to semantically check the code at compile time, and if it is unspecified, try to “mod” the existing code with our copy of the function to check if it works during static analysis, by temporarily loading a “ground truth” version of libraries temporarily for code check purposes it can just be assumed that this is the copy of the lib the program will always have around it anyway
"Do we get any value out of supporting (), other than completeness? Passing () by value should just be ignored as if it weren't specified. Do we want people using pointers to (), and do those have any advantage over pointers to void?"
Here's my take on this, () should be supported in the strict sense of a no-op, but should ABSOLUTELY NOT actually get carried over a language barrier. C++ and probably several other languages act with the assumption that size_t can never be zero, and so Unit would release absolute armageddon upon them.
Unit pointers are handy simply because C is a hellscape that requires pointers to "what is this again?". Unless the need for void pointers can be fixed, it would not be a good idea to eliminate unit pointers. Unit pointers should essentially be treated as a more rusty void pointer.
C++ and probably several other languages act with the assumption that size_t can never be zero, and so Unit would release absolute armageddon upon them.
You can just tell them to get good like they tell everyone who wants memory safety. "Skill issue", etc..
Regardless, there's more ZSTs than just unit; struct MyCustomError; seems like a pretty important thing to preserve. Any inter-language bans on ZSTs would be a pretty big issue. Even just unit would be an issue because, for reasons unknown, the url crate has several instances of pub fn ...(...) -> Result<(), ()>.
I know I'm not a relevant voice in this discussion but it does need to be mentioned that ZSTs aren't something you can ban without consequence like !.
Actually ! is just fundamentally conceptually incompatible with most languages so that probably does need an inter-language ban. (Maybe some kinda unsafe system for CrABI that marks ! as unsafe???)
Actually unsafe types would handle ZSTs pretty well. Is that on the table or am I just wildly off the mark? I hope that's on the table because if done well it's a perfect solution.
You can just tell them to get good like they tell everyone who wants memory safety. "Skill issue", etc..
Regardless, there's more ZSTs than just unit;
struct MyCustomError;seems like a pretty important thing to preserve. Any inter-language bans on ZSTs would be a pretty big issue. Even just unit would be an issue because, for reasons unknown, theurlcrate has several instances ofpub fn ...(...) -> Result<(), ()>.I know I'm not a relevant voice in this discussion but it does need to be mentioned that ZSTs aren't something you can ban without consequence like
!.Actually
!is just fundamentally conceptually incompatible with most languages so that probably does need an inter-language ban. (Maybe some kinda unsafe system for CrABI that marks!as unsafe???)Actually unsafe types would handle ZSTs pretty well. Is that on the table or am I just wildly off the mark? I hope that's on the table because if done well it's a perfect solution.
oh wow I was unaware that the url crate uses Result<(), ()>, ! needs a ban, but ZSTs are grounds where we must tread carefully.
also don't worry about not being a relevant voice, this is my first major contribution to talks on issues like this.
my suggestion was meant as a more extreme approach to raise the issue that ZSTs will cause havok if shipped over a language border
addendum: crabi could also do what C++ does and represent ZSTs as empty chars
For those of us not in the know, would somebody mind expanding the abbreviation ZST?
Zero size type. In Rust, most or all types with only one valid value have size zero, like an empty struct, empty tuple, or an enum with a single member and no data.
Actually unsafe types would handle ZSTs pretty well. Is that on the table or am I just wildly off the mark? I hope that's on the table because if done well it's a perfect solution.
the more I think about it, the more a way to assert that a symbol is only for rust -> rust usage makes sense, or rather an “if you’re using this symbol in other languages, you’re on your own”. this way we don’t have to throw out things like ZSTs and !, instead moving them into a group where using them from outside rust -> rust is a bad idea, but not completely forbidden
What subset of lifetimes can, and should, we support? We can't enforce them cross-language, but they may be useful as an advisory/documentation mechanism. Or we could leave them out entirely.
I think lifetimes as a documentation mechanism could be very useful, as they provide a reasonably ergonomic way of communicating pointer lifetime expectations and the relations between them.
Niches: should we support cases like Option
without a separate discriminant, or should we (for simplicity) always pass a separate discriminant? Likely the latter. However, what about things like Option<&T> and Option<NonZeroU32>, for which Rust guarantees the representation of None? Those work with the C ABI, and they have to work with crABI, but can we make them work with crABI using the same encoding of None?
I think keeping the NPO is a victimless crime, as languages with nullable pointers (gross, I know) can then digest Option<&T> like a normal pointer.
As for cases like Option
Related: how can we handle tuples? Do we need a way to express repr(crabi) tuples? How can we do that conveniently?
I think the cleanest way to lower tuples into a FFI-friendly form is to turn (i32, String) into struct MyGoofyTuple {0: i32, 1: String}
Should we provide support for extensible enums, such that we don't assume the discriminant matches one of the known variants? Would doing so make using enums less ergonomic? Could we address that with language changes?
I think the best way to do this is for crabi to have some way to communicate an enum being #[non_exhaustive], but assume exhaustiveness if not specified
For ranges, should we provide a concrete range type or types, or should we defer that and handle ranges as opaque objects or traits?
A concrete range type would be nice, but not at all necessary, as they can probably be opaque without too much pain.
Should we do anything special about i128 and u128, or should we just push for getting those supported correctly in extern "C"?
I think the easiest way to handle i128 and u128 is to just get them working in extern "C". This way, they should work for everyone, even outside crabi
For generics, such as Option
or Result<u32, ConcreteError> or [u8; 16], does the rule "all generic parameters must be bound to concrete types in the function signature" suffice, or do we need a more complex rule than that?
I don't see a problem with all generic parameters needing to be bound to concrete types, but I might be missing some nuance here.
Worrying about how other languages will interoperate with crabi would seem to be significantly reducing the usefulness of crabi as a rust to rust ABI for compiled modules.
How about focusing on representing rust elements as fully as feasible with each release having a detailed interop feature set that an implementor can opt in piecemeal rather than having to support everything out the gate? Obviously such support would need to be exposed programmatically.
This should permit early adopters to implement what they need leaving additional features to be implemented as needed down the road.