rfcs
rfcs copied to clipboard
RFC: Improve C types for cross-language LLVM CFI support
This RFC proposes improving C types to be able to identify C char and integer type uses at the time types are encoded for cross-language LLVM CFI support.
As the industry continues to explore Rust adoption, the absence of support for forward-edge control flow protection in the Rust compiler is a major security concern when migrating to Rust by gradually replacing C or C++ with Rust, and C or C++ and Rust -compiled code share the same virtual address space. Thus, support for forward-edge control flow protection needs to be added to the Rust compiler and is a requirement for large-scale secure Rust adoption. For more information about LLVM CFI and cross-language LLVM CFI support, see the design document in the tracking issue rust-lang/rust#89653.
I don't understand this RFC at all. There seems to be several options of how to do whatever you think this does. An RFC should usually be more concrete about what it's proposing.
If you want to have a discussion about developing a design you could try the rust Zulip, perhaps the T-lang stream would be the appropriate place to start: https://rust-lang.zulipchat.com/
Sorry if it's unclear. Let me try to summarize:
LLVM uses type metadata to allow IR modules to aggregate pointers by their types. This type metadata is used by LLVM Control Flow Integrity to test whether a given pointer is associated with a type identifier (i.e., test type membership). Clang and the Rust compiler (for cross-language LLVM CFI support) use the Itanium C++ ABI mangling as type metadata identifiers for function pointers.
As a requirement for cross-language LLVM CFI support, the Rust compiler must be able to identify C char and integer type uses (which are type aliases) at the time types are encoded to be able to encode these correctly using the Itanium C++ ABI mangling. However, at the time types are encoded, all type aliases are already resolved to their respective ty::Ty
type representations (i.e., their respective Rust aliased types), making it currently not possible to identify C char and integer type uses from their resolved types.
This RFC proposes improving C types to be able to identify C char and integer type uses at the time types are encoded for cross-language LLVM CFI support by either:
- creating a new set of transitional C types in
core::ffi
as user-defined types usingrepr(transparent)
to be used at the FFI boundary (i.e., for extern function types with the "C" calling convention) when cross-language CFI support is needed (and taking the opportunity to consolidate all C types incore::ffi
). - changing the currently existing C types in
std::os::raw
to user-defined types usingrepr(transparent)
. - changing C types to
ty::Foreign
and changingty::Foreign
to be able to represent them. - creating a new
ty::C
for representing C types.
Option (1) is opt in for when cross-language CFI support is needed, and requires the user to use the new set of transitional C types for extern function types with the "C" calling convention.
Option (2), (3), and (4) are backward-compatibility breaking changes and will require changes to existing code that use C types.
There is also work in progress in rust-lang/rust#97974 for rust-lang/compiler-team#504 that may also be a potential alternative solution to this issue.
I'm not sure how viable that is since it sounds like TBAA, and Rust explicitly doesn't have TBAA
Uh, step 2 is... probably not possible. even with an edition change, and added lang support for new types of literals, it's probably simply too much breakage and mismatch that would occur between older code using aliases and newer code using the newtypes.
Why can't Rust types just be mapped to the corresponding C types for CFI purposes, e.g. i32
would be mapped to C int
?
Is the issue with pairs such as {char
, signed char
} and {long
, long long
}, where they're distinct C types but may map to the same Rust type?
If so… is it really necessary for LLVM CFI to distinguish those types in the first place? They need to be distinguished in standard ABI mangling since you can have, e.g., void foo(char)
and void foo(signed char)
as two distinct function overloads. But there's no fundamental reason they need to be distinguished for CFI, and the security benefit of doing so seems extremely marginal.
This is a requirement for cross-language LLVM CFI support, which is forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space), because Clang distinguishes between those types by using the Itanium C++ ABI mangling as type metadata identifiers for function pointers.
I think we should provide as much comprehensible protection as possible as long as it doesn't introduce any constraints that the language itself doesn't have (besides the intended purpose). In the future, I'd like to go even further and implement something like:
Given three functions with the same return and parameter types A, B, and C, and an indirect call at location L to a function with the same return and parameter types as A, B, and C. If it's determined that it isn't possible for C to be a called at location L, to group C in a different group than A and B, allowing only the members of the group A and B belong to be called at location L.
But this is outside the scope of this RFC.
Clang distinguishes between those types by using the Itanium C++ ABI mangling as type metadata identifiers for function pointers.
It does today, but that could change. Though perhaps Android is relying on the existing CFI ABI being stable? I have no experience with it.
I think we should provide as much comprehensible protection as possible as long as it doesn't introduce any constraints that the language itself doesn't have (besides the intended purpose).
Well, that is what you are proposing: introducing a constraint into Rust that it distinguish pairs of types that it currently does not distinguish (and has no need to distinguish). Supporting CFI is highly desirable, but Rust having no input into what the CFI ABI looks like, despite this ABI only being developed recently, would be an unfortunate state of affairs.
Given three functions with the same return and parameter types A, B, and C, and an indirect call at location L to a function with the same return and parameter types as A, B, and C. If it's determined that it isn't possible for C to be a called at location L, to group C in a different group than A and B, allowing only the members of the group A and B belong to be called at location L.
Are you talking about something like C++ vtables, where you have more information than just the function signature, or are you talking about some sort of IR-level static analysis? Regardless, this sort of thing would likely be fine for Rust. Regarding C++ vtables, Rust doesn't provide built-in compatibility with them, nor is there much Rust code that generates or accesses them by hand (AFAIK), so any measures taken specifically for them wouldn't break much Rust code. Meanwhile, Rust's own trait object vtables have an intentionally unstable layout and could perhaps opt into a similar mechanism in the future. Regarding static analysis, that presumably wouldn't break anything as long as it didn't make any unwarranted C-specific assumptions.
Neither the current implementation nor what this RFC proposes introduces any constraints to the language (besides forward-edge control flow protection). The language already distinguishes between both its explicitly-sized integer types (i.e., i8, i16, i32, …) and C abstract integer types (i.e., c_char, c_short, c_long, …), which actual sizes are implementation defined and may vary across different systems, and defines a mapping between these. The issue is that this mapping is done using type aliases, and type aliases don't live long enough throughout the compilation process to make it possible to identify C char and integer type uses from their resolved types--this is a known issue, see rust-lang/compiler-team#504.
It would definitely be a breaking change to the language, though not one that is likely to come up fully inside of rust:
#[no_mangle]
extern "C" fn foo(x: c_int) { ... }
#[cfg(int_is_i32)]
extern {
fn foo(x: i32);
}
is entirely correct today. More realistic versions would be being lazy in handwriting FFI and using i32
instead of c_int
and so on (maybe combined with the technical-UB of using *mut u8
for char*
).
Also I'm not sure about the existing C/LLVM CFI info - what gets generated for int32_t
? That definitely needs to correspond to rust i32
, not just c_int
.
To enable CFI and cross-language CFI to work properly, the correct/corresponding C types would need to (and arguably should anyway) be used for extern "C" function declarations, as is explicitly documented in the design document in the tracking issue rust-lang/rust#89653 and this RFC:
- be able to identify C char and integer type uses at the time types are encoded.
- not assume that C char and integer types and their respective Rust aliased types can be used interchangeably when forward-edge control flow protection is enabled, at least not at the FFI boundary.
But their resolved types could still be used as arguments at call sites without any problems.
Yes, my point is that this is absolutely a breaking change and what it buys is "able to implement an existing API that is way more C++-centric than it should be"
Note: in the presence of cross-language LTO you already have to have the Rust side use the precisely correct definitions as the C side or it can trigger UB. https://github.com/rust-lang/unsafe-code-guidelines/issues/198 Maybe this is arguably a bug, but there it is.
While related, that still isn't i32
-vs-c_int
, just issues with types that are a little more special in rust in general.
It would definitely be a breaking change to the language, though not one that is likely to come up fully inside of rust:
Or just
extern "C" { fn foo(x: c_int); }
…
foo(42i32);
Also I'm not sure about the existing C/LLVM CFI info - what gets generated for
int32_t
? That definitely needs to correspond to rusti32
, not justc_int
.
In C++, int32_t
is on most platforms a type alias for int
, so it is presumably treated the same as int
.
Note: in the presence of cross-language LTO you already have to have the Rust side use the precisely correct definitions as the C side or it can trigger UB. rust-lang/unsafe-code-guidelines#198 Maybe this is arguably a bug, but there it is.
However, this doesn't cause a problem for pairs like long
and long long
or char
and signed char
because they are the same type at the LLVM IR level. The LLVM-IR-level type system is separate from the type annotations used for CFI support, which are applied by the frontend.
(As for i32
versus c_int
, I don't think that is a problem in the first place. There is no ambiguity so there is no need to differentiate them, or benefit in doing so.)
Neither the current implementation nor what this RFC proposes introduces any constraints to the language (besides forward-edge control flow protection). The language already distinguishes between both its explicitly-sized integer types (i.e., i8, i16, i32, …) and C abstract integer types (i.e., c_char, c_short, c_long, …), which actual sizes are implementation defined and may vary across different systems, and defines a mapping between these.
Rust code that assumes that e.g. c_long
equals i64
is not portable, but it's valid platform-specific code.
The issue is that this mapping is done using type aliases, and type aliases don't live long enough throughout the compilation process to make it possible to identify C char and integer type uses from their resolved types--this is a known issue, see rust-lang/compiler-team#504.
The known issue is the compiler not maintaining information about type alias use in order to produce better diagnostics and documentation. In that case, the added complexity of distinguishing a type from its alias is only internal to the compiler and not present in the normative model of the type system. This is very different from changing the type system so that two types that used to be the same are now different. The added complexity is itself a small cost to the language, but more importantly, the backwards compatibility break is a huge cost.
Call it a 'constraint' or not, but it's highly undesirable.
But their resolved types could still be used as arguments at call sites without any problems.
If you're suggesting that arguments of type, e.g. i8
, if passed as arguments to these functions, would automatically be coerced to a new type, e.g. a new repr(transparent)
c_char
type, this kind of coercion would be a new behavior that exists nowhere else in the language, further increasing the added language complexity.
In Rust we want to be able to have actual provenance tracking of various pointer values. In C, it is considered acceptable to cast these back and forth from integers, and so code will pass pointers as integers. In Rust, we increasingly think it is not. So Rust code that wants to have things like "accurately tracked pointers", which is very proximal to "control flow integrity", may decide to override the definition and link against a mildly fictional version of external C code, redeclaring a C uintptr_t
as *mut T
instead. In doing so, we can get much better sanitization up to the FFI boundary using the MIR interpreter.
This tends to work fine, surprisingly, because most platforms have C calling conventions that don't actually care about this detail. On ones that do, like CHERI / Morello, they also disagree with C at large and implicitly treat uintptr_t
as a pointer, or in the case of Motorola 68000, the C implementation for that platform is itself prone to getting this wrong.
So "Rust code should use exactly the same C definitions" is not a very strong "should" to me. The C code should be correct. If the C code is wrong, then Rust programmers who know what they're doing (calling any extern "C"
function is unsafe
, so anyone who does the preceding is shouldering personal responsibility for any UB it induces, I am not a lawyer, consult your local doctor before writing C FFI, side effects may include headaches, dizziness, and nausea) should be at liberty to correct it.
We could, of course, simply write a C code shim to accomplish the same thing, but it seems a rather laborious effort to produce what is essentially no-op code.
override the definition and link against a mildly fictional version of external C code, redeclaring a C uintptr_t as *mut T instead
See https://github.com/rust-lang/rust/issues/95496 for more details -- some system C APIs pun integers and pointers, essentially expecting you to invoke them as so: libc::prctl(PR_SET_NAME, name.as_ptr() as libc::c_ulong, 0, 0, 0)
. This completely scrambles the ability for tools to reason about pointer escapes so we've been recommending overruling the random nonsense that these kinds of APIs insist on and making them more type accurate.
Similarly, when it comes to tools like bindgens, I've long argued that it's preferable to declare extern APIs in the most native rust form allowed, because rust is simply better at defining FFI interfaces than C or C++, and it doesn't make sense to throw out that information on both sides.
extern fn(val1: Option<&u32>, val2: &mut u64)
is much more useful signature than
extern fn(val1: *mut u32, val2: *mut u64)
Or just
extern "C" { fn foo(x: c_int); } … foo(42i32);
As long as the extern "C" function declaration continues having the correct/corresponding C types, this example would continue to work without any problems.
The known issue is the compiler not maintaining information about type alias use in order to produce better diagnostics and documentation.
These were cases were the issue manifested, not the issue itself. The issue is indeed the type alias information not living long enough throughout the compilation process (i.e., lack of a ty::Ty kind for type aliases and a DefId).
In that case, the added complexity of distinguishing a type from its alias is only internal to the compiler and not present in the normative model of the type system. This is very different from changing the type system so that two types that used to be the same are now different. The added complexity is itself a small cost to the language, but more importantly, the backwards compatibility break is a huge cost.
And they'll continue to do so. They will continue to be able to be used interchangeably anywhere else in Rust code but at the FFI boundary (i.e., at the extern "C" function declarations). I.e., an extern "C" { fn foo(bar: i32); }
will be different than a void foo(int bar)
at the FFI boundary (or when called back from C code if passed as a callback, and vice versa), when CFI is enabled.
If you're suggesting that arguments of type, e.g.
i8
, if passed as arguments to these functions, would automatically be coerced to a new type, e.g. a newrepr(transparent)
c_char
type, this kind of coercion would be a new behavior that exists nowhere else in the language, further increasing the added language complexity.
Since they would need to be used at the FFI boundary only (i.e., at the extern "C" function declarations) there is no new behavior other than the already existing one for repr(transparent)
. https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent says:
Also, passing the struct through FFI where the inner field type is expected on the other side is guaranteed to work. In particular, this is necessary for struct Foo(f32) to always have the same ABI as f32.
This is for option (1), which would add new repr(transparent)
types to be used at the FFI boundary (i.e., at the extern "C" function declarations) when cross-language CFI support is needed, and option (2), which would change the currently existing C types to repr(transparent)
types. If the work in progress in https://github.com/rust-lang/rust/pull/97974 for https://github.com/rust-lang/compiler-team/issues/504 becomes the solution to this issue, these wouldn't even be necessary, and the currently existing C types will be handled similarly to how c_void and the () type are in the current implementation.
If the work in progress in https://github.com/rust-lang/rust/pull/97974 for https://github.com/rust-lang/compiler-team/issues/504 becomes the solution to this issue, these wouldn't even be necessary, and the currently existing C types will be handled similarly to how c_void and the () type are in the current implementation.
That compiler change doesn't imply any user-facing changes, as the changes are for rustdoc support and getting more information, and rustdoc can run without doing codegen. So I don't expect it will, really. What you are proposing remains a user-facing change, even if it only triggers with CFI enabled: setting aside the deliberate disrespect for C's type definitions that we already express, having to define an interface using its type alias instead of the original type is neither necessary nor correct. And to Rust, c_char
is an alias of either i8
or u8
. Even if it only breaks with CFI enabled, this causes a language split, where a previously irrelevant type alias is now load-bearing.
That's not acceptable.
I suspect that assigning different semantics based on whether a function signatures used a type alias or the original type¹ (that the type alias is aliasing), is a complete non-starter and some of the hypothetical discussion here might be a waste of everyone's time if that's the case. cc @rust-lang/types
¹ (two types, which even if the compiler may eventually be able to represent differently, exhibit complete type equality even in the absence of subtyping, i.e. in invariant positions)
Oh and it's possible to use generics with FFI:
unsafe extern "C" fn callback<T>(input: T, user_data: *mut c_void) {...}
let user_data = &mut my_data as *mut _ as *mut c_void;
foo_config_query_int("timeout", callback::<c_int>, user_data);
foo_config_query_int32("counter32", callback::<i32>, user_data);
foo_config_query_float("ratio", callback::<f32>, user_data);
callback::<c_int>
and callback::<i32>
are either the exact same type (today) or completely equal types (in the future, see ¹ above) - either way, it's impossible to tell them apart:
let callback_c_int_or_i32 = [callback::<c_int>, callback::<i32>];
assert_eq!(0, size_of_val(&callback_c_int_or_i32)); // No function pointers, ZST function types.
let fn_ptr: fn(_, _) = callback_c_int_or_i32[i]; // The result cannot soundly depend on `i`.
And because they're not individual declarations (where you could argue that there's some sneaky non-local-quasi-syntactic-sugar going on), but part of the typesystem, you simply cannot make two symbols out of them.
(i.e. this case I am certain of, unlike my "cc @rust-lang/types" above for the weaker one)
Personally I still don't understand how any of this could possibly have ever worked for anything other than C++ vtables and similarly restricted things. Is the answer "we patch our dependencies" and/or "we turn it off for large parts of the build"? Or maybe "there's a dozen indirect calls and/or they barely use any types"?
That compiler change doesn't imply any user-facing changes, as the changes are for rustdoc support and getting more information, and rustdoc can run without doing codegen. So I don't expect it will, really. What you are proposing remains a user-facing change, even if it only triggers with CFI enabled: setting aside the deliberate disrespect for C's type definitions that we already express, having to define an interface using its type alias instead of the original type is neither necessary nor correct. And to Rust,
c_char
is an alias of eitheri8
oru8
. Even if it only breaks with CFI enabled, this causes a language split, where a previously irrelevant type alias is now load-bearing.
The changes are not for rustdoc support only--it adds a proper ty::Ty kind (i.e., TyAlias) and a DefId for type aliases, and it'll very likely become a solution to this issue (see https://github.com/rust-lang/rust/pull/97974/commits/11a9bbf1e52bb8a6a21e2a7436c954aed3761a5c#diff-0cc45ef317b8526bab4aad2b698ce2c63741ed8fe9c3d96497fc73a4fd17d419), where at this point C char and integer type uses can be identified from their resolved types, and the currently existing C types could be handled similarly to how c_void and the () type are in the current implementation.
I'm afraid that if using the correct/corresponding C types for extern "C" function declarations[^1] is a blocker, when CFI is enabled and for cross-language CFI to work properly, cross-language CFI might not be possible (at least not without making changes to how Clang encodes C char and integer types for CFI).
[^1]: Which arguably should be used anyway, since their actual sizes are implementation defined and may vary across different systems, and the types are aliased correctly accordingly.
Oh and it's possible to use generics with FFI:
unsafe extern "C" fn callback<T>(input: T, user_data: *mut c_void) {...} let user_data = &mut my_data as *mut _ as *mut c_void; foo_config_query_int("timeout", callback::<c_int>, user_data); foo_config_query_int32("counter32", callback::<i32>, user_data); foo_config_query_float("ratio", callback::<f32>, user_data);
callback::<c_int>
andcallback::<i32>
are either the exact same type (today) or completely equal types (in the future, see ¹ above) - either way, it's impossible to tell them apart:let callback_c_int_or_i32 = [callback::<c_int>, callback::<i32>]; assert_eq!(0, size_of_val(&callback_c_int_or_i32)); // No function pointers, ZST function types. let fn_ptr: fn(_, _) = callback_c_int_or_i32[i]; // The result cannot soundly depend on `i`.
The encoding is done after types are monomorphized (and these examples are supported). See https://github.com/rust-lang/rust/blob/master/src/test/codegen/sanitizer-cfi-emit-type-metadata-id-itanium-cxx-abi.rs for some examples of how function items are encoded.
Even if we could make everything work after resolving type generics and the like, that
#[cfg(target_arch = "x86_64")]
extern "C" {
fn strncpy(dest: *mut i8, src: *const i8, count: usize) -> *mut i8;
}
#[cfg(target_arch = "aarch64")]
extern "C" {
fn strncpy(dest: *mut u8, src: *const u8, count: usize) -> *mut u8;
}
works in user code is not a trivial feature of Rust. It is part of the language's definition. Because extern "C" fn
is not merely a syntax for FFI bindings against C code. It is Rust code. It declares its validity to Rust and is typechecked in Rust. To compile Rust code says it has a sensical execution flow. To then interrupt its execution via failing to deliver Rust control flow to the target symbol is to break working Rust code.
Our only job, mind, is to deliver the flow of execution to the extern symbol and to have a sufficiently matching calling convention such that the C function can do its job. If the implementation is internally malicious, we don't care. You called C code. What were you hoping for? A pat on the back? But it must not induce linker errors, and CFI schema that Rust supports must not preempt us calling the symbol based on that definition if the call is valid in Rust. That types will remain valid is in fact part of the stability promise moreso than any other behavioral promise.
Opting in to type system changes would cause deep splits between minor versions; it would also create a high maintenance burden in the compiler, since both older and newer versions would have to be supported.
We already have several mitigation measures, such as opt-out or temporary deprecation, that can be used to ease the transition around a soundness fix. Moreover, separating out new type rules so that they can be "opted into" can be very difficult and would complicate the compiler internally; it would also make it harder to reason about the type system as a whole.
We can argue whether or not this should work:
extern "C" fn some_fn(val1: Option<&u32>, val2: &mut u64);
Though it currently does, so it would require more than a small discussion to change that.
But to change the language such that
#[cfg(target_arch = "x86_64")]
type c_char = i8;
#[cfg(target_arch = "aarch64")]
type c_char = u8;
#[cfg(any(target_arch = "aarch64", target_arch = "x86_64"))]
type size_t = usize;
extern "C" {
fn strncpy(dest: *mut c_char, src: *const c_char, count: size_t) -> *mut c_char;
}
is "more correct" in some way in order to supply a particular kind of integration with the LLVM toolchain's preferred C compilers, and thus everyone will be told they must write that, and not the other thing, is to induce a breaking change.
Which arguably should be used anyway, since their actual sizes are implementation defined and may vary across different systems, and the types are aliased correctly accordingly.
And according to Rust, this is incorrect, because we do not actually support arbitrary C implementations. We support the target platform's C dialect. That's what extern "C"
and repr(C)
actually mean. That's why we use MSVC and GCC to do linkage instead of doing it ourselves: because they know where their system libraries are and how to link Rust code against them.
Let me provide some background information from the design document, so the costs (and the risks) vs benefits of the options that are being proposed may be fully understood:
With the increasing popularity of Rust both as a general purpose programming language and as a replacement for C and C++ because of its memory and thread safety guarantees, an increasing number of companies and projects are adopting or migrating to Rust. One of the most common paths to migrate to Rust is to gradually replace C or C++ with Rust in a program written in C or C++.
Rust provides interoperability with foreign code written in C via Foreign Function Interface (FFI). However, foreign code does not provide the same memory and thread safety guarantees that Rust provides, and is susceptible to memory corruption and concurrency issues. Therefore, it is generally accepted that linking foreign C and C++ -compiled code into a program written in Rust may degrade the security of the program.^1^3
While it is also generally believed that replacing sensitive C or C++ with Rust in a program written in C or C++ improves the security of the program, Papaevripides and Athanasopoulos^4 demonstrated that it is not always the case, and linking foreign Rust-compiled code into a program written in C or C++ with modern exploit mitigations, such as control flow protection, may actually degrade the security of the program, mostly due to the absence of support for these exploit mitigations in the Rust compiler, mainly forward-edge control flow protection.
Since then, it has been discussed^5^7, and just recently it was formalized^8 as a new class of attack (i.e., cross-language attacks). And as companies continue to explore Rust adoption, it has been a major security concern (and a blocker) when migrating to Rust by gradually replacing C or C++ with Rust, and C or C++ and Rust -compiled code share the same virtual address space. It was also one of the major reasons that initiatives such as Rust GCC--which I also fully support--were funded^5.
It is my view that, if the Rust compiler is providing interoperability with a language, and wants to provide a path towards Rust adoption by the industry, it must not degrade the security of the program it is being integrated into. Therefore, the cost of using the correct/corresponding C types (either as the way it currently is as aliases, as a new opt-in set of repr(transparent)
types, or changing the current aliases to repr(transparent)
types) for extern "C" function declarations (and let me add: only when they're used for FFI), only when CFI is enabled, and only when cross-language CFI is needed, seems pretty reasonable to me.
This is all very compelling, and we do love security, but breaking old code is simply not acceptable. Not at all.
So, your proposals have to be some path forward that do not break any previous code. Unfortunately, we've got a lot of FFI stuff already designed into the language, and even a small change could end up breaking stuff. I won't speak for anyone else, but I'm not really seeing how to proceed given the absolute rule that we can't break old code.
For example, would just an entirely new set of types be acceptable? Under a core::ffi::cfi
sub-module perhaps? Then code that wants to opt in to CFI can clearly and specifically do so by using alternative types.
For example, would just an entirely new set of types be acceptable? Under a
core::ffi::cfi
sub-module perhaps? Then code that wants to opt in to CFI can clearly and specifically do so by using alternative types.
This is what is proposed in option (1).
That's not how I initially understood the description of option (1), but if that's what you meant then sure.
There are other forward-edge control-flow assurance schema that do not require Clang++'s RTTI or a mapping to Clang++'s RTTI, and that have been integrated with the Rust compiler without any cost whatsoever to existing language semantics. The papers that you highlight did not include evaluating these other mitigation strategies by comparison, some of which actually have hardware assistance and thus can have considerably more durable backing than software-driven guarantees.
Further, no matter what, this still breaks
extern "C" fn some_fn(val1: Option<&u32>, val2: &mut u64);
Again, this is Rust code, not C code. That it will bind against an externally declared C interface doesn't matter, because the definition of the types are brought into and specified by Rust. Option 1 still canonizes these new FFI types, and it does so at the cost of intralanguage sanitizers, encouraging people to write type-lossy shims at the Rust side of the C-Rust boundary, making the FFI code less sound by extension. Encouraging people to write less-sound Rust code in order to integrate with C code, instead of devising a schema that can accompany the type mappings that Rust code will want to be able to use, seems like a bad choice.
Rust has already been adopted on a large scale without the Clang++-based security mechanisms you describe, partly because there are in fact other security mechanisms that the language can easily afford and that help address these concerns. Rust still supports shadow stacks, indirect branch targeting, Windows Control Flow Guard, et cetera, to various degrees of completeness.
Finally, several of the exploits in the "Exploiting Mixed Binaries" paper that you mention rely on C calling into Rust and then subverting the appearance of an unsafe
block thereby. At the risk of stating the obvious: it is Rust that requires unsafe
blocks, not C. Of those, several that Rust is more prone to in the mixed-address-space case are mitigated instead by expanding support for SafeStack, which is backward-edge control, which I expect would also be very low-cost in terms of language semantics. Of others, I remember that we closed, over 2020 and 2021, a few odd gaps regarding closures and unsafety. I should check all of these examples to see which ones still compile. Some look to rely on unsound or incorrect Rust-qua-Rust code and other edge cases where unsafe
is implied but not visually specified, so we can probably break them.
The paper itself explains that control-flow protection can be built at various levels of language semantics. Machine code is sufficient, but there can be benefits to using high-level type information. That type information can be used to build programs that have forward-edge integrity... if these languages actually agree on type definitions.
For C and Rust, despite the superficial similarities? We don't. Not any more than two pieces of assembly code that successfully call each other would agree.
Sorry to belabor the point, but does "option 1" create new types as an optional way to specify an external call just for people interested in CFI, and permanently leave the old types alone? or does it create new types that are eventually forced on all people to use? Because that's a big difference.