hyper icon indicating copy to clipboard operation
hyper copied to clipboard

C API ownership problems

Open nnethercote opened this issue 2 years ago • 11 comments
trafficstars

Version Latest

Platform All

Description The docs for hyper_clientconn_handshake say "Both the io and the options are consumed in this function call." But the code looks like this:

    fn hyper_clientconn_handshake(io: *mut hyper_io, options: *mut hyper_clientconn_options) -> *mut hyper_task {
        let options = non_null! { Box::from_raw(options) ?= ptr::null_mut() };
        let io = non_null! { Box::from_raw(io) ?= ptr::null_mut() };

If options is null the function won't take ownership of io, which has a high chance of causing a memory leak. Probably Box::from_raw should be called on both arguments before they are checked for null?

Similarly, the docs for hyper_executor_push say "The executor takes ownership of the task, it should not be accessed again unless returned back to the user with hyper_executor_poll." But the code does this:

    fn hyper_executor_push(exec: *const hyper_executor, task: *mut hyper_task) -> hyper_code {
        let exec = non_null!(&*exec ?= hyper_code::HYPERE_INVALID_ARG);
        let task = non_null!(Box::from_raw(task) ?= hyper_code::HYPERE_INVALID_ARG);

If exec is null the function won't take ownership of task, which also is likely to result in memory leaks. In this case the two lines could be swapped to fix the problem.

I'm happy to file a PR fixing these if it seems like I've diagnosed the situation correctly. I looked through all the other C API functions and these were the only two cases like this I could find.

nnethercote avatar Aug 28 '23 05:08 nnethercote

Thanks for the write-up and taking a look. I see what you're saying. Which would be more expected? If the function is called with bad arguments, is it expected that it still took ownership of everything else that wasn't bad? I think this currently assumes that if the function returns an error-ish value, that the caller may want to do something else with the arguments. But if it's more idiomatic, we can just gobble everything up first.

I admit my experience with writing C APIs is limited (I've read and debugged plenty of C, but not really exposed a public API). So I welcome hearing your recommendations :)

seanmonstar avatar Aug 30 '23 13:08 seanmonstar

I agree with OP - caller can't know exactly where callee failed (whether on the first check or second) then can't know which one it should "own" so best behavior is to first own then check. Another option would be to "leak" in case of error, so it won't be consumed but I think first option is better.

aviramha avatar Aug 30 '23 16:08 aviramha

I also wasn't sure about this. The current behaviour of owning one and not owning the other is definitely bad. Own-on-failure and don't-own-on-failure are both reasonable, but I lean towards own-on-failure:

  • It matches the existing docs.
  • It is simpler to describe and implement. You do all the owning up front; if there's an error later in a function you don't have to worry about de-owning things.
    • hyper_clientconn_options_http2 is an interesting case, it's a rare function that can fail for a reason other than an invalid arg, and it currently takes ownership of (and drops) the opts in the error case.
  • It is potentially simpler for the C coder, because they don't have to free the input(s) on failure.
  • The only downside I can see is that it prevents the C coder from doing something else with the input on failure. But the curl code I've seen never does that, it always jumps straight to some error handler.

https://github.com/curl/curl/pull/11745 is relevant: there was a call to hyper_request_set_body(hyperreq, body), which always takes ownership of body. On failure the code then did hyper_body_free(body), which would case a double free. So I removed that hyper_body_free call. (In practice this double free would only happen if hyperreq was null.)

Another example from curl: Curl_http has lots of code like this:

  hyper_io *io = NULL;
  hyper_clientconn_options *options = NULL;
  ...
  io = hyper_io_new();
  if(!io) {
    failf(data, "Couldn't create hyper IO");
    result = CURLE_OUT_OF_MEMORY;
    goto error;
  }
  ...
  options = hyper_clientconn_options_new();
  if(!options) {
    failf(data, "Couldn't create hyper client options");
    result = CURLE_OUT_OF_MEMORY;
    goto error;
  }
  ...
  <more code here that can `goto error`>
  ...
  handshake = hyper_clientconn_handshake(io, options);
  if(!handshake) {
    failf(data, "Couldn't create hyper client handshake");
    result = CURLE_OUT_OF_MEMORY;
    goto error;
  } 
  io = NULL;
  options = NULL;
  ...
error:
  if(io)
    hyper_io_free(io);
    
  if(options)
    hyper_clientconn_options_free(options);

Look at where the null assignments are. They are appropriate if hyper_clientconn_handshake uses the don't-own-on-failure approach. For the own-on-failure approach the null assignments should be moved to before the if(!handshake). (Which, admittedly, is non-idiomatic C, to have something between the handshake assignment and the handshake check.) This pattern is repeated a lot in curl. In practice it doesn't matter right now because almost all the hyper C API functions can only fail if passed a null pointer, and curl does enough checking that this never happens. The examples in the hyper repo don't do much error checking, so they don't provide guidance.

If we do commit to own-on-failure, then (a) the docs should probably be clarified throughout, and (b) curl's code should be changed to move the null assignments earlier.

@bagder may have opinions here.

nnethercote avatar Aug 30 '23 22:08 nnethercote

My opinion is pretty simple and clear:

If there would be documentation for each C function that describes how they work, including when they fail, we would adapt to that no matter what the decision is to do about ownership. I think the lack of the first part is what makes the second part so complicated.

bagder avatar Sep 13 '23 12:09 bagder

For rustls-ffi, I wrote up what I think is the correct solution: https://github.com/rustls/rustls-ffi/issues/244.

Think of creation, giving away ownership, and freeing as a state machine. Most objects have a very nice simple state machine: new, then free. They must be exactly balanced; no use-after-free and no double free.

Some objects have a more complicated state machine: new, then one of (a) transfer ownership or (b) free. These still must be exactly balanced, even in error conditions.

This state machine is made of poison; if you fail to exactly balance, you get undefined behavior.

So it is worthwhile to spend a little memory in order to go back to the simple state machine. For every object where ownership can be transferred, represent it on the Rust side as a Box<Option<Foo>>. When ownership is transferred, leave behind a None. So the Box can still be freed safely (and indeed must be freed exactly once). This holds true regardless of whether there were any errors.

One way to think of this: since C doesn't have a language-level notion of ownership, it can't guarantee the "free or transfer" semantics that Rust can. That's the source of a lot of memory safety bugs. In bringing Rust code to C, we want to not just provide a safe internal implementation, but provide an interface that is as-safe-as-possible. In other words, an interface that is easy to code against correctly.

jsha avatar Oct 23 '23 15:10 jsha

Perhaps using Option<Box<Foo>> should be preferred to Box<Option<Foo>>. After all, Option<Box<Foo>> is the direct safe-Rust equivalent of a nullable smart pointer. Operating on a raw nullable pointer, in C with manual memory management, is analogous to handling an optional boxed value, in Rust with static memory management.

This approach provides an elegant parallel with respect to Rust semantics. It matches up with the Rust compiler's null-pointer optimization (NPO) which makes mem::size_of::<Option<Box<Foo>>>() == mem::size_of::<Box<Foo>>() because the box itself, being a pointer, can never be null. In fact, this will enable you to simply and safely transmute with zero runtime overhead, consistent ownership semantics, and beautiful code style:

fn pointer_to_box<V>(ptr: *mut V) -> Option<Box<V>> {
    // SAFETY
    // As explained above : )
    unsafe { mem::transmute(ptr) }
}

fn box_to_pointer<V>(boxed: Option<Box<P>>) -> *mut V {
    // SAFETY
    // And likewise
    unsafe { mem::transmute(boxed) }
}

fn hyper_clientconn_handshake(io: *mut hyper_io, options: *mut hyper_clientconn_options) -> *mut hyper_task {
    let options = pointer_to_box(options);
    let io = pointer_to_box(io);

    let task: hyper_task = { ... };
    box_to_pointer(Some(hyper_task))
}

The function arguments are consumed regardless of success or failure. One could imagine writing a macro to replace this boilerplate and generate automatic FFI bridges, whereby Rust code uses Option<Box<V>> and C code operates with raw pointers. This approach is invulnerable: casual mistakes in C code, like passing null pointers, cannot break the Rust implementations thanks to using Option.

A248 avatar Nov 30 '23 22:11 A248

@A248 The problem I describe solving in https://github.com/rustls/rustls-ffi/issues/244 is not that C code might pass a NULL pointer, it's that C code might pass a dangling pointer. That is, a pointer to an object that has been moved by Rust code.

For this purpose, Option<Box<Foo>> is not helpful, because it would have to store the "target was moved" information on the caller side. In other words, if the C code has foo *item, it would represent the None variant by setting item = NULL after it knows the Rust code consumed the item (moved it, or turned it into something else). Most C code does do that. But there are two problems:

  1. Control flows are complicated and sometimes the code to NULL out a pointer after use doesn't get called.
  2. The Rust code has to clearly communicate whether it moved the item or not.

Even if we carefully document whether items are moved on error or not, we still run into (1) because the C code has to check an error code before deciding whether to free and NULL the pointer or whether to just NULL it.

So, yes, Box<Option<Foo>> is a bummer because it doesn't take part in the NPO, but it is necessary to achieve the properties I want: Rust knows at runtime whether a given item has been moved, at least until the containing pointer has been freed.

jsha avatar Nov 30 '23 22:11 jsha

I think I see what you mean. However, I must admit I'm not too familiar with C. I do have one question, though, about your statement here:

because the C code has to check an error code before deciding whether to free and NULL the pointer or whether to just NULL it.

Why does the C code need to decide whether to free the pointer? As I understand it, if the Rust code uses Option<Box<Foo>>, the Rust compiler will drop the parameters in all circumstances. Thus, as soon as the Rust function returns, the arguments have been consumed and, if necessary, dropped and freed. Rust takes ownership so C doesn't have to worry about freeing the passed arguments anymore.

I guess, what I am wondering is, why can't the C code just hand over responsibility for freeing memory to the Rust compiler? Then, C code only has to NULL the pointer, or somewhat stop using it (I'm not really clear on what C programmers do after they have a pointer they know is freed).

Of course, I may be missing some finer semantics here relating to drops, frees, and returning memory to the allocator. So, please do correct me if I'm wrong. I come from higher-level languages and have worked my way down here...

A248 avatar Nov 30 '23 23:11 A248

Why does the C code need to decide whether to free the pointer?

Speaking in general, not about the specific case that prompted this discussion:

All objects have a lifecycle. They are created and then they are freed. In Rust, that lifecycle is mostly managed automatically. Objects are freed when they go out of scope without having ownership handed to some other scope.

In C, the lifecycle is managed manually. Every object must be freed once and exactly once, by calling free(3) or by calling an object-specific function like hyper_executor_free(). This is the source of a large proportion of memory safety vulnerabilities.

When writing an FFI API in Rust, the Rust code usually has to hand over ownership of certain values to the C code. That makes the C code responsible for freeing them. The Rust code doesn't necessarily know when the C code is done using them.

jsha avatar Dec 01 '23 00:12 jsha

Yes, I understand the general case that C must free memory manually. What I'm looking at, though, is what happens in these Hyper FFI functions when C code passes arguments into Rust. In particular, I believe we can arrange for Rust to free the passed arguments, rather C. That would remove any need for C code to decide whether to free the pointer.

Remember this question:

I also wasn't sure about this. The current behaviour of owning one and not owning the other is definitely bad. Own-on-failure and don't-own-on-failure are both reasonable, but I lean towards own-on-failure:

I agree with own-on-failure. My suggestion is that Rust should own all function arguments. Own, drop, and free them on failure and own, drop, and free them on success. C can relinquish ownership and have Rust handle freeing the memory. A straightforward way of achieving this objective presents itself in representing *mut T as Option<Box<T>>. In memory these are represented the exact same way; the only difference is that Rust will add static guarantees to the Option<Box<T>>. Those static guarantees will be correct if and only if the C code passed valid function arguments, which should be the case anyway.

If we're trying to make sure that the Rust implementation of a FFI function properly consume ownership of its arguments, the most immediately obvious step is to transform manually-managed *mut T pointers into idiomatic drop-checked values of Option<Box<T>>. They're equivalent at runtime, but semantically, the latter is far more in accordance with standard Rust practices, and makes it absolutely clear what the Rust code is doing with the function arguments. The idea is to convert the arguments first, then check preconditions regarding null pointers.

The current code uses Box::from_raw and checks nullability at the same time for each argument. This leads to varying behavior, in the case of failure, with respect to freeing the other function arguments. All I'm proposing is to bring the function arguments into Rust land first; afterward, to check preconditions. (Now that I think of it, if ptr.is_null() { None } else { Some(Box::from_raw(ptr)) } is probably a more future-proof method than transmuting and will probably be optimized to equivalence).

I can see that you're on a different train of thought, so I'll try to address the other reasoning behind your proposition.

In bringing Rust code to C, we want to not just provide a safe internal implementation, but provide an interface that is as-safe-as-possible. In other words, an interface that is easy to code against correctly.

In this case, I think you are trying to tackle a bigger problem than simply the varying ownership problem described in the beginning of this issue. C code is inherently unsafe. C code that interfaces with Hyper must pass non-dangling non-null pointers to the appropriate functions. This will always be a problem, but it seems that you want to make C code less error-prone. It's valiant, and we should try, though I think we're quite limited by the fundamentals of C as a language.

Let's say you want to use Box<Option<Foo>> to represent objects on the Rust side. If C is directly passing a *mut Box<Option<Foo>> as an argument, or some erased equivalent, then you still have to deal with the nullability of the outer pointer -- namely, the Box. When you create a Box<Option<Foo>> in Rust, the equivalent in C code must be Option<Box<Option<Foo>>>, because C does not have nullability guarantees. There's no way to work around the fact that C code might accidentally pass null pointers. The best that Hyper can do is guard against them in a safe and reliable manner that does not leak memory. Recognizing the optionality of the outer pointer is the only way to do that.

So the Box can still be freed safely (and indeed must be freed exactly once). This holds true regardless of whether there were any errors.

Just to clarify, a Box does not need to be freed explicitly -- that will happen automatically. Both Option<Box<Foo>> and Box<Option<Foo>> provide static memory management. The difference is that Option<Box<Foo>> is the direct analogue of a nullable but aligned *mut Foo with static cleanup, whereas Box<Option<Foo>> is an aligned and non-null pointer to a tagged union where the tag and Foo are inline.

Rust knows at runtime whether a given item has been moved, at least until the containing pointer has been freed.

I want to say I'm not really sure why we'd need Rust to track at runtime whether an item is removed. It should be C code that satisfies preconditions like the non-nullability of function arguments. Rust code only has to check those preconditions and then integrate the function arguments into happy safety land, where the compiler extends the safety of those preconditions into the indefinite monster of the remaining codebse. Well-formed arguments, once encoded in the type system, preclude the need for any kind of runtime ownership tracking. The compiler's guarantees are transitive.

A248 avatar Dec 01 '23 02:12 A248

I was still curious, so I looked into what others have said about this, because I'm still relatively new to this. The docs for Box seem to suggest using Option<Box<Foo>> for the reason I described: that the C pointer can be null. Even more interestingly, it seems that FFI functions can be declared as simply accepting Option<Box<Foo>>, meaning that manual conversions in source code are unnecessary. From https://doc.rust-lang.org/std/boxed/index.html#memory-layout:

/* C header */

/* Returns ownership to the caller */
struct Foo* foo_new(void);

/* Takes ownership from the caller; no-op when invoked with null */
void foo_delete(struct Foo*);

These two functions might be implemented in Rust as follows. Here, the struct Foo* type from C is translated to Box<Foo>, which captures the ownership constraints. Note also that the nullable argument to foo_delete is represented in Rust as Option<Box<Foo>>, since Box<Foo> cannot be null.

#[repr(C)]
pub struct Foo;

#[no_mangle]
pub extern "C" fn foo_new() -> Box<Foo> {
    Box::new(Foo)
}

#[no_mangle]
pub extern "C" fn foo_delete(_: Option<Box<Foo>>) {}

Notice how Rust can return a non-null Box<Foo> back to C land. However, because this is an externally expoed API, the Rust code cannot make assumptions about the argument to foo_delete and must check for nullability. The implementation of foo_delete is, correctly, an empty function, because it will drop and free the *mut Foo hiding behind the abstractions if and only if that *mut Foo is non-null.

So I think resolving this issue may be as simple as changing function arguments to Option<Box<Foo>>. Provided, of course, that the relevant argument types like hyper_io, hyper_clientconn_options, and company are instantiated in Rust land using the global allcoator -- which seems to hold true based on my reading of the Hyper C API part of the codebase.

A248 avatar Dec 01 '23 03:12 A248