autocxx Possible way to support field accesses with offset known only by C++

Follow-up to https://github.com/google/autocxx/issues/19#issuecomment-706634698. Here is a proof of concept (playground).

// Suppose we have no idea what the true size/alignment of std::string is but
// want a Rust struct which behaves like:
//
//     struct S {
//         std::string i;
//         std::string j;
//         uint32_t k;
//     };

use std::fmt::{self, Debug};

#[repr(C)]
pub struct CxxString([u8; 0]);

#[repr(C)]
pub struct S {
    pub i: CxxString,
    _rest: (),
}

impl Debug for S {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter
            .debug_struct("S")
            .field("i", &self.i)
            .field("j", &self.j)
            .field("k", &self.k)
            .finish()
    }
}

#[repr(C)]
pub struct _Has_j {
    pub j: CxxString,
    _rest: (),
}

#[repr(C)]
pub struct _Has_k {
    pub k: u32,
    _rest: (),
}

impl std::ops::Deref for S {
    type Target = _Has_j;
    fn deref(&self) -> &Self::Target {
        unsafe {
            &*(self as *const S)
                .cast::<u8>()
                .offset(foreign::_S_i_to_j())
                .cast::<_Has_j>()
        }
    }
}

impl std::ops::Deref for _Has_j {
    type Target = _Has_k;
    fn deref(&self) -> &Self::Target {
        unsafe {
            &*(self as *const _Has_j)
                .cast::<u8>()
                .offset(foreign::_S_j_to_k())
                .cast::<_Has_k>()
        }
    }
}

impl Debug for CxxString {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("\"TODO\"")
    }
}

// Implemented in C++.
mod foreign {
    pub extern "C" fn _S_i_to_j() -> isize {
        // return offsetof(S, j) - offsetof(S, i);
        32
    }
    pub extern "C" fn _S_j_to_k() -> isize {
        // return offsetof(S, k) - offsetof(S, j);
        32
    }
}

pub fn print_k_get_j(s: &S) -> &CxxString {
    println!("{}", s.k);
    return &s.j;
}

Oct 11 '20 01:10 dtolnay

Obviously we'd be hoping for good LTO... :slightly_frowning_face:

Oct 11 '20 01:10 dtolnay

Here is a different way that doesn't rely on daisy chaining the Deref impls so it should always have the same performance as accessor methods if no cross-language LTO. (playground)

use std::fmt::{self, Debug};
use std::marker::PhantomData;
use std::ops::Deref;

#[repr(C)]
pub struct CxxString([u8; 0]);

#[repr(C)]
pub struct S {
    pub i: CxxString,
    pub j: Get<S_j, CxxString>,
    pub k: Get<S_k, u32>,
    _private: (),
}

impl Debug for S {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter
            .debug_struct("S")
            .field("i", &self.i)
            .field("j", &self.j)
            .field("k", &self.k)
            .finish()
    }
}

pub struct Get<A: Accessor, T = <A as Accessor>::Field> {
    _accessor: PhantomData<A>,
    _value: PhantomData<T>,
}

pub trait Accessor: Sized {
    type Field;
    unsafe fn get(this: &Get<Self>) -> &Self::Field;
}

impl<A: Accessor> Deref for Get<A> {
    type Target = A::Field;
    fn deref(&self) -> &Self::Target {
        unsafe { A::get(self) }
    }
}

impl<A> Debug for Get<A>
where
    A: Accessor,
    A::Field: Debug,
{
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        Debug::fmt(&**self, formatter)
    }
}

#[doc(hidden)]
pub enum S_j {}
#[doc(hidden)]
pub enum S_k {}

impl Accessor for S_j {
    type Field = CxxString;
    unsafe fn get(this: &Get<Self>) -> &CxxString {
        &*(this as *const Get<Self>)
            .cast::<u8>()
            .offset(foreign::S_j())
            .cast()
    }
}

impl Accessor for S_k {
    type Field = u32;
    unsafe fn get(this: &Get<Self>) -> &u32 {
        &*(this as *const Get<Self>)
            .cast::<u8>()
            .offset(foreign::S_k())
            .cast()
    }
}

impl Debug for CxxString {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("\"TODO\"")
    }
}

// Implemented in C++.
mod foreign {
    pub extern "C" fn S_j() -> isize {
        // return offsetof(S, j) - sizeof(std::declval<S>().i);
        32
    }
    pub extern "C" fn S_k() -> isize {
        // return offsetof(S, k) - sizeof(std::declval<S>().i);
        64
    }
}

pub fn demo_get_j(s: &S) -> &CxxString {
    &s.j
}

Oct 11 '20 02:10 dtolnay

Thank you for these. Very interesting.

My previous "plan" was:

Genuine accessor methods in C++.
Relying on the existence of cross-language LTO to make performance theoretically unaffected — though I recognize that cross-language LTO sounds hard to get working in practice.
To try to provide consistency in field access between POD and non-POD types, I was thinking of a autocxx_get! macro which would do the right thing in each case (direct field access for POD types; nasty-but-hopefully-inlined C++ accessor methods for others). (I would probably need another layer of pure-Rust accessors generated for each field with a consistent interface given the lack of global knowledge available to macros). However, I'm wondering if there's a compromise where I can do something funky with Deref or similar to hide the accessor methods, whilst retaining them (quite close to your second example).

I will think about the offsetof approach. I hadn't thought of that before. One advantage of that would be if the C++ implementation is constexpr and that constant-ness makes its way successfully through the cross-language LTO in order to simplify the Rust field access machine code. But then again, if LTO is good enough for that anyway, then C++ field accessor functions should also be inlined and disappear.

Another possible advantage of the offsetof approach is that such offsets could conceivably be cached somewhere on the Rust side (in the absence of LTO.) It seems doubtful that it would be worth the overhead, but maybe.

At the moment I think I am likely to explore genuine C++ accessor methods first because I'm not sure that there aren't other problems that I haven't thought of (bitfields? some kind of crazy template specialization thing? operator overloading?) and doing the actual field access in C++ seems less likely to run into unexpected corner cases than trying to compute offsets. But that's just because I am paranoid, and maybe I'll decide an offsetof approach is better as I inch towards this.

A lot of this autocxx work is predicated on an assumption that many C++ types are well-encapsulated and most accesses are done by methods anyway, so field access hasn't got to the top of my list of priorities yet anyway.

Oct 11 '20 02:10 adetaylor

autocxx autocxx copied to clipboard

Possible way to support field accesses with offset known only by C++

autocxx
autocxx copied to clipboard