Optimize passing going from a Rust `std::string::String` to a Swift `String`
This issue was born from a discussion on the Rust reddit
Problem
Today, in order to go from a Rust std::string::String to a Swift String we need to:
-
Box::into_raw(Box::new(string))the ruststd::string::Stringto get a raw pointer to the String (one allocation) https://github.com/chinedufn/swift-bridge/blob/38699d43bf998ff8698ba03039663bc92a634bbc/crates/swift-bridge-ir/src/codegen/codegen_tests/string_codegen_tests.rs#L135 -
Send the pointer to Swift
-
Allocate a
RustString(ptr: ptr)class instance https://github.com/chinedufn/swift-bridge/blob/38699d43bf998ff8698ba03039663bc92a634bbc/crates/swift-bridge-ir/src/codegen/codegen_tests/string_codegen_tests.rs#L142-L144 -
Call
rustString.toString(), which allocates aswift Stringhttps://github.com/chinedufn/swift-bridge/blob/d4d97e462672aed7827d76abbccb9aab255c9e9d/src/std_bridge/string.swift#L27-L46 -
The
class RustStringon the Swift side has adeinitmethod that calls into Rust to deallocate the Ruststd::string::String
When returning a Rust &mut String such as:
extern "Rust" {
type Foo;
fn get_mut_string(&self) -> &mut String;
}
we want a class RustStringRefMut on the Swift side that exposes methods to access / mutate the underlying Rust std::string::String.
However, when returning an owned String such as:
extern "Rust" {
type Foo;
fn get_owned_string(&self) -> String;
}
there is no reason to have the intermediary class RustString since we don't need Rust to manage the underlying std::string::String.
Instead we want to go directly from Rust std::string::String to Swift String.
Open Questions
This entire issue still needs some more thought on planning... We need to think through all of the implications.
Here are a list of things to think through that we can add to over time:
- [ ] If we called shrink_to_fit on the std::string::String before passing it over to Swift we'd only need the pointer and length in order to de-allocate it later.. If we use a
CFAllocatorContext(illustrated in the comment below...func rustDeallocate) in order to de-allocate thestd::string::Stringwhenever Swift no longer needed it... we'd have the pointer to the bytes.. but how would we get that len? Or do we need another approach..? Could have a global de-allocator on the Rust side that looked up lengths in something like aHashMap<String pointer, String length>in order to de-allocate Strings... But perhaps there's a simpler/smarter approach to all of this..?
Here's an example of going from a buffer if utf8 bytes to a Swift String
func testNewStringPassingApproach() throws {
var string = makeString()
string.withUTF8({buffer in
print(
"""
String buffer before mutation \(buffer)
"""
)
})
string += "."
string.withUTF8({buffer in
print(
"""
String buffer after mutation \(buffer)
"""
)
})
print(string)
}
var RustStringDeallocator: CFAllocatorContext = CFAllocatorContext(
version: 0,
info: nil,
retain: nil,
release: nil,
copyDescription: nil,
allocate: nil,
reallocate: nil,
deallocate: rustDeallocate,
preferredSize: nil
)
func rustDeallocate(_ ptr: UnsafeMutableRawPointer?, _ info: UnsafeMutableRawPointer?) {
print(
"""
Deallocating pointer \(ptr)
"""
)
}
// https://gist.github.com/martinmroz/5905c65e129d22a1b56d84f08b35a0f4
func makeString() -> String {
let buffer = UnsafeMutableBufferPointer<UInt8>.allocate(capacity: 11)
let _ = buffer.initialize(from: "hello world".utf8)
print(
"""
Allocated String buffer \(buffer)
"""
)
let bytes = buffer.baseAddress!
let numBytes = buffer.count
let stringDeallocator = CFAllocatorCreate(kCFAllocatorDefault, &RustStringDeallocator)
// https://developer.apple.com/documentation/corefoundation/1543597-cfstringcreatewithbytesnocopy
let managedCFString = CFStringCreateWithBytesNoCopy(
kCFAllocatorDefault,
bytes,
numBytes,
CFStringBuiltInEncodings.UTF8.rawValue,
false,
// Should be kCFAllocatorNull for &str
// TODO: take retained or unretained ?
stringDeallocator?.takeUnretainedValue()
)
let cfStringPtr = CFStringGetCharactersPtr(managedCFString!)
print("CFString pointer: \(cfStringPtr)")
var managedString: String = managedCFString! as String
managedString.withUTF8({buffer in
print(
"""
String initial buffer \(buffer)
"""
)
})
return managedString
}
Output
Allocated String buffer UnsafeMutableBufferPointer(start: 0x00007ffaa2c06d00, count: 11)
CFString pointer: nil
String initial buffer UnsafeBufferPointer(start: 0x00007ffee42548d0, count: 11)
Deallocating pointer Optional(0x00007ffaa2c06d00)
String buffer before mutation UnsafeBufferPointer(start: 0x00007ffee4254b80, count: 11)
String buffer after mutation UnsafeBufferPointer(start: 0x00007ffee4254b60, count: 12)
hello world.
It looks like in between going from our raw utf8 byte buffer to our Swift String the bytes seemingly get copied to a new address.
Not sure why.. need to look into whether or not it's possible to remove that copy.
Hmm.. it seems like there isn't a zero-copy way to construct a Swift String today https://forums.swift.org/t/does-string-bytesnocopy-copy-bytes/51643/3 .
So.. this all needs some more design thinking.... For example.. if you're reading a 1Gb file into a String and then passing that to Swift, you probably don't want any avoidable copies.
We need to think through cases where we'd want to immediately copy bytes from a Rust std::string::String to a Swift String and when we'd instead want to have an intermediary RustString Swift class. Do we need to be able to annotate functions to indicate how owned Strings should be passed? If so, what would the default be? Things like that need to be answered.
I have a simple suggestion.
When we want rust to have an ownership, annotate such as:
extern "Rust" {
#[swift_bridge(ownership = "rust")]
fn get_value()->String;
}
/// Generated Swift code
func get_value()->RustString{
//...
}
When we want swift to have an ownership, annotate such as:
extern "Rust" {
#[swift_bridge(ownership = "swift")]
fn get_value()->String;
}
/// Generated Swift code
func get_value()->String{
//...
}
By default, I think swift should have an ownership because a swift-bridge beginner(such as me) expects to be able to use String directly.
When we pass the owned Rust String over to Swift we don't want Rust ownership of the String.
We'd much prefer for it to always immediately become a Swift String once we pass it over.
The only reason that we don't do that is that right now this would involve copying all of the Rust String's bytes to a new Swift String's memory address. We don't want to do any copying without the user explicitly calling .toString().
If Swift had a zero-copy way to construct a Swift String, we would immediately turn the Rust String into a Swift String and delete the RustString type.
Here's how things would ideally work.
#[swift_bridge::bridge]
mod ffi {
extern "Rust" {
// On the Rust side when this is called we would immediately
// `std::mem::forget` the String.
// !! THIS IS NOT HOW THINGS WORK TODAY !!
fn make_string() -> String;
}
}
// Swift
// Generated code
func make_string() -> String {
// In here the automatically generated code will construct
// a Swift String that points to the same memory address that
// the Rust String did.
// This means that we've created a a Swift String from a Rust String
// without any copying.
// !! THIS IS NOT HOW THINGS WORK TODAY !!
}
So, before exploring any annotation based approaches I think we'd want to research whether or not it is, or will ever be possible to construct a Swift String without copying.
If it is or ever will be possible, then that would be a much better approach than annotating functions.