curryrs icon indicating copy to clipboard operation
curryrs copied to clipboard

Exposing CString like this is prone to improper ownership management

Open lspitzner opened this issue 7 years ago • 7 comments

As far as i can tell, the description of how to handle complex data structures in #5 is proper. CString should count as "complex", as it wraps some byte array, yet it is exposed like other, more trivial types in the current interface. I.e. one might easily write

safe_ffi! (
  fn testString() -> CString {
    CString::new("hello").unwrap()
  }
);

which, under my current understanding, is rather unsafe.

The easiest safe interface I can currently think of involves two wrapper types: RustCString and HaskellCString, where the name indicates which language's allocator is used for the underlying data. Safety could be ensured by only exposing a native constructor for each type in the respective language. (Technically only one such type would suffice by making a static choice to do String allocations in one language's allocator only. But the implementations would be symmetrical anyways..)

lspitzner avatar Nov 15 '16 13:11 lspitzner

Hey thanks for dropping by with this issue! Are you saying we should create a wrapper around it to pass between the two languages and that the wrapper we make would abstract away those unsafe things? Possibly creating a From or Into trait in Rust to convert between say HaskellString to a Rust String? I think this would be feasible and probably better. Is there something about the CString that's inherently unsafe? It's possible to use byte arrays in Rust without there being any problem. Is there something I'm missing there?

mgattozzi avatar Nov 16 '16 03:11 mgattozzi

Firstly I should perhaps ask if it is correct that my code example effectively contains use-after-free? I am relatively new to rust, so it is far more likely that I am the one missing something :)

And to clarify, yes I meant creating wrappers around any potential memory leaks; no, nothing about CString is inherently unsafe as far as I can tell. Having tried some more stuff it now seems as if providing a completely safe abstraction (that would statically prevent memory leaks for string data - or any complex data for that matter) might not be possible - details below.

In that case, my suggestion would be to simply improve the documentation; something along putting CString in a separate module for "heap/allocation-based" types with proper usage instructions.

My closest attempt is this:

module RustString (RustString, unpackRustString) where

foreign import ccall "deallocate_string" rustDeallocateCString :: RustString -> IO ()

newtype RustString = RustString (CString)

unpackRustString :: RustString -> IO String
unpackRustString rcstr@(RustString cstr) = do
  !str <- peekCString cstr
  rustDeallocateCString rcstr
  return str

The idea being that the constructor of RustString is not exposed, so that the only possible source of such a value is as a return value from some rust function. And the only thing the user can do with the RustString is to unpack it, which contains the deallocation on the rust side.

The issue with this approach is that it is not possible to make RustString part of any FFI type on the haskell side without exposing its constructor, but then the user may construct values that violate the invariant of "points to memory allocated on the rust side".

The only way around this I can think of is introducing some kind of safe wrapper around any foreign declarations on the haskell side (similar to what the macros in rust do) so that the unpackRustString-part can be completely hidden behind the scenes. But that involves CPP or TH. Not sure if that is worth it.

lspitzner avatar Nov 16 '16 12:11 lspitzner

Here's the documentation on CString from the Rust docs:

CString is intended for working with traditional C-style strings (a sequence of non-null bytes terminated by a single null byte); the primary use case for these kinds of strings is interoperating with C-like code. Often you will need to transfer ownership to/from that external code. It is strongly recommended that you thoroughly read through the documentation of CString before use, as improper ownership management of CString instances can lead to invalid memory accesses, memory leaks, and other memory errors.

You're right that there might be some problems here if it's done improperly. Having some safe wrapper around it probably would be a good idea if done right. We do need a CString though for passing between the two. It might be best to have a way that converts say a String to a CString and then have a way to extract from a CString to say ByteString or Text in Haskell and vice versa. It would be nicer to pass a wrapped data structure but until I or someone figures out the best way to do it consistently it's all new territory.

It might be possible to create a class constraint of some sort so that the invariant isn't violated something like:

fooBar :: (RustFFI) => RustString -> IO String

There's a few possibilities. The goal is ease of use while hiding the underlying abstraction. We might not always be able too however and this might be one of those cases if we intend to make it safe.

mgattozzi avatar Nov 16 '16 14:11 mgattozzi

Sorry, my initial description was not very clear. There are two questions/issues. One is #5, and a safe string implementation is a special case of that. The other is the mundane issue that currently, the Str type synonym lacks some proper notice to the reader that strings "are complex", i.e. require some thoughts about ownership. I'd propose to make this issue only about the Str documentation.

(Of course it might make sense to track a safe string wrapper as a separate issue, too.)

lspitzner avatar Nov 22 '16 16:11 lspitzner

Ideally the user-facing curryrs types module shouldn't expose raw pointer types at all, and automatically generated bindings would wrap pointers in safe types. Without automatic binding generation, there will always be the unsafety of mismatched foreign declarations.

For the string representation, I think we shouldn't be using null-terminated C strings at all. Both Haskell and Rust use a length and pointer pair representation of (byte)strings.

nanotech avatar Nov 23 '16 21:11 nanotech

That's all fair, but as long as such high-level interface is not implemented, please properly document the low-level interface.

lspitzner avatar Nov 23 '16 21:11 lspitzner

@lspitzner would you like to take a crack at documenting this then? I'd gladly accept a pull request on this. Since it's the end of the semester for me I haven't had the time to work on side projects like this

mgattozzi avatar Nov 23 '16 22:11 mgattozzi