rs-matter In place initialization of the `Matter` object, the buffers and subscriptions

(UPDATED: The section "Why is this PR still a draft" with recent progress.)

What follows below is a "mini-RFC" of sorts, justifying why we need this change, how it is implemented, next steps and so on.

What is the problem?

Initialize the Matter object, as well as other large objects (like IM handler's PooledBuffers) by avoiding the possibility of stack blowups.

Status Quo

This problem is already solved actually!

The reason why we can initialize the Matter structure... (and other structures, but from now on I'll talk only about Matter as everything applicable to Matter is also valid for the other large non-Future structures we are having, like PooledBuffers and Subscriptions)... so the reason why we can initialize it without risking stack memory blow-ups is because it has a const constructor. In other words, Matter::new(...) can be called from a const context.

How does that help?

You can simply use static_cell's ConstStaticCell and simply do:

static MATTER: ConstStaticCell<Matter> = ConstStaticCell::new(Matter::new(...));

... what the above would do is that it will reserve a place for the Matter object in the data section of the executable, and the linker startup script will initialize the Matter structure (and other structures in the data section) - upon program startup - with its value. This is possible, because Matter::new is const and so is ConstStaticCell::new. Therefore, the object layout of ConstStaticCell<Matter> is generated at compile time, saved (in the form of a byte sequence) in the linker data elf section, and then upon program startup, the whole data section is copied from the elf (or from flash) into RAM by the linker startup using a simple memcpy (or a memcpy-like) routine.

You can also do all sorts of equilibristics at runtime, and the compiler usually optimizes it. Like (excuse any typos):

const MATTER: Matter = Matter::new(...);

let boxed_matter = unsafe {
    let boxed_matter: Box<MaybeUninit<Matter>> = Box::new_uninit();

    // This is optimized by the compiler to do a similar `memcpy` as to when the `data` section is initialized by the linker,
    // because we are copying a `const` value into the `*mut Matter` raw pointer.
    // Be careful to inline this code and not hide the `MATTER` const behind a function though, as this optimization might 
    // not trigger then.
    boxed_matter.as_mut_ptr().write(&MATTER);

    boxed_matter.assume_init()
}

// `boxed_matter` is now initialized, without any stack blow-ups
...

Why are we solving (again) an already solved problem then?

Using const-initializers comes with one disadvantage: flash size. Since the default value of the Matter object is not all 0s (or uninitialized data) - and if you can trust me - it is very, very difficult to come up with a default Matter object state which is all 0s and/or MaybeUninit (*) - ultimately - the Matter object is placed in the data section instead of in bss, which means the initial layout of the object has to be burned in flash.

Depending on the size of all buffers, number of sessions, number of supported fabrics and so on, the total size of the Matter stack is anywhere between 30KB to 70KB.

Now, this is nothing for embedded Linux scenarios, and is probably not that much for Espressif MCUs which have 4MB flash. But it might be something for other MCUs which generally have less flash size.

(*) Coming up with "all-zeroes" default object state in Rust is very, very difficult.

Stuff which is a useful default and which is "all-zeroes":

false in bool
0 in u8, u32, u16 etc
None in Option<NonZeroU8>, Option<NonZeroU16>, etc

Stuff which is a useful default and which is NOT "all-zeroes":

"" in &str (&str is a fat pointer and is NOT modeled - like in C++ - with (nullptr, 0), but with (alignof(&str), 0) i.e. it ends up having a binary value of 0x0000000100000000 (on 32bit archs)
b"" (empty slice) in &[u8] - ditto as for &str
None for Option<Foo> where Option<Foo> is not special-cased by the compiler as in e.g. Option<NonZeroU8>. Especially with Option<Foo> where Foo is an enum, the compiler often plays trick where it models None as an extra value in the Foo's enum tag, and this extra value in the tag is almost never 0

The More Important Reason why we are solving it again

Putting aside flash size concerns, we have an existing, unsolved case of stack blow-ups (and excessive stack usage at runtime): We are very often allocating large objects on-stack and then moving them in their final destination (which is usually a heapless::Vec or an array inside the Matter object).

Cases where we do this:

Fabric (1.7KB) - first allocated on-stack then moved in the vector by vec.push(fabric)
Session (~ 1KB currently, can be reduced) - ditto
ACL - ditto
A lot of other places, like temporary arrays of ~ 1KB inside the Secure Channel implementation

But let's concentrate on Fabric and Session for now. In order to avoid their temporary stack allocation (and subsequent move into the vec via push), we need to somehow implement a "placement-new" for heapless::Vec, and somehow initialize the Fabric or Session in-place, directly in the memory of the heapless::Vec instance. Something like:

vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).fabric_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).fabric_idx).write(fabric_idx); 
        // ... and so on
    }
});

Two problems with the above:

We need a custom heapless::Vec, as the upstream one does not have (yet) a push_in_place method
Using push_in_place requires a lot of care by the caller, as it is full of unsafe code. And very, very verbose!

Can we do better?

Yes. With pinned-init.

( BTW: We'll put aside the pinned- aspect of that crate. All our objects are Unpin and fortunately, pinned-init supports unpinned objects just fine, contrary to its name - without any exposure of the user to the notion of pinning. )

So what does pinned-init do? Grossly oversimplifying, but it allows us to turn:

vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_idx); 
        // ... and so on
    }
});

into:

vec.push_in_place(Fabric::init());

where Fabric::init() is almost like Fabric::new() except with a slightly different signature and function body:

impl Fabric {
    fn init(node_id: u64, fabric_id: u64, fab_idx: NonZeroU8) -> impl Init<Self> {
        init!(Self {
            node_id,
            fabric_id,
            fab_idx,
            icac <- rs_matter::utils::vec::Vec::init(), // Notice the weird `<-` syntax here and how we compose our `init` with `Vec`s `init`!
        })
    }

The above syntax - thanks to the init! macro provided by pinned-init allows us to use safe, readable, composable syntax to in-place initialize structures recursively! Or rather and more correctly - to express an in-place initializer/constructor that does it. Almost like in C++!

The similarity with regular new() -> Self is on purpose of course.

Can't we apply the `const fn` trick to `Fabric` and `Session`?

Yes, we can. Assuming we implement const fn Fabric::new() -> Self as a parameter-less initial state of the Fabric object, we can try the following:

const INITIAL_FABRIC: Fabric = Fabric::new();

fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?;
let fabric = fabric_mgr.fabrics_vec.last_mut().unwrap();

// Now modify `fabric` members as we see fit
fabric.node_id = ...

BUT: The problem is fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?; Even though INITAL_FABRIC is a const, we have no guarantee, that the compiler will (a) inline the call to fabrics_vec.push(...) and (b) it will use a memcpy to initialize the new fabric at the end of the vector with the INITAL_FABRIC const. It might or might not do that, depending on the optimization settings.

In contrast pinned-init always initializes in place, even in opt=0.

`pinned-init` background

This crate (or rather, a copy of it) is used in the RfL (Rust for Linux) project, and is therefore merged in the Linux kernel mainline. So despite the low-star rating on GitHub, I think the approach of the crate is very solid and credible.

Changes delta

The changes suggested here are incremental, and most importantly - additive.

All const fn constructors are preserved, so folks can continue using Matter::new(...) in a const context
Matter (and all other structures recursively downwards) just get an extra init method next to their new constructor functions, which is an almost verbatim copy of the existing new, yet with the large members of the concrete struct using the <- syntax of the pinned_init::init! macro which delegates to the placement-new init methods of the inner structures

Not so ideal stuff:

I had to copy RefCell from Rust core into rs_matter::utils::cell::RefCell as the core RefCell does not have an init placement-new method. However this is temporary. We'll have to anyway get rid of RefCell in future in favor of using a real mutex, which is either no-op for single-threaded scenarios (the default), or a real blocking one. This future mutex or the beginnings of it is available under the new blmutex module which is introduced by this PR as well
I had to copy heapless::Vec as rs_matter::utils::vec::Vec and extend it to (a) have an init in-place constructor and (b) to have the push_inplace method discussed above. Maybe in future, if the pinned-init crate gains traction in the Rust embedded echosystem (and Embassy in particular) we might be able to merge the Vec changes upstream in heapless (and merge our blmutex changes upstream too). But I find both of these not such a big issue.

Alternatives

Option 1: Do nothing

We can continue relying on const fn for the initialization of the Matter object (and pay in increased flash size usage). For Fabric and Session, we can either do the const trick described above (and rely on compiler opt settings to do their job), or we can still fork heapless::Vec and introduce push_in_place but without the convenience of pinned_init::init! we'll have to use a lot of unsafe to in-place initialize the members of the Vec (Fabric and Session and in future possinly others).

Option 2: `&mut`-borrow large `heapless::Vec` instances

(This is what I tried originally.)

We can make the big heapless::Vec instances used in Matter no longer owned by Matter, but rather - borrowed in Matter by &mut references.

I.e. FabricMgr would become FabricMgr<'v> because it would contain a &'v mut heapless::Vec<Fabric> rather than owning that vector (or array) as it is now. Consequently, Matter<'a> would become Matter<'a, 'v>, Exchange<'a> would become Exchange<'a, 'v> and so on down the line. Why the existing covariant 'a lifetime cannot be merged with the new 'v lifetime is explained below.

Advantages:

Simple and obvious change. Solves the "flash size" problem because these large, initially-empty Vecs are the primary contributors to the flash size increase as they constitute the majority of the Matter object "data", even if they initially contain... nothing

Disadvantages:

Matter becomes even less ergonomic: now the user has to separately inject another set of buffers besides PooledBuffers. I.e. this change increases complexity. With the alternative suggested here, we can in fact merge PooledBuffers back into Matter at some point in future, which would reduce complexity
Even worse: the 'v lifetime from above for the &'v mut heapless::Vec external buffers we'll be using is invariant (as it is for a &mut ref). So we can't merge it with the existing, nice covariant 'a lifetime the Matter object has. No matter what we do - I even tried with object pools and whatnot, but ultimately, these are either &mut or RefCell<&Pool> or SomeInteriorMutabilityConstructLikeMutex<&Pool> non-mutable references, and because cells with interior mutability always result in invariant lifetimes, like &mut, we always end up with 'v being invariant and thus unmergable with 'a.
- We can of course hard-code both or either of 'a or 'v to be 'static, but this way we sacrifice the flexibility of Matter on platform like Embedded Linux, where none of the problems discussed here are applicable (stacks on Linux are 2MB by default) and where the user might want to allocate Matter and BasicInfoConfig and DevAttDataFetcher and its mDNS impl and even the buffers on-stack. That would be impossible with 'static.

Why is this PR still a draft?

~~pinned-init is currently still utilizing core::alloc::AllocError which is not in Rust stable yet. (and for no_std this is the only nightly feature it needs).~~

~~It seems the author is open to changing the crate in a way where it no longer unconditionally depends on core::alloc::AllocError. This error type is anyway only used in the InPlaceInit trait, which is a decorator of Arc, Rc and Box (i.e. the alloc module) which we don't use/need, so perhaps we can just put - in pinned-init - the definition of InPlaceInit (and the usage of core::alloc::AllocError) behind the alloc feature flag?~~

UPDATE: Problem solved. The relevant PR addressing ^^^ got merged yesterday and we are now using it via the newly released pinned-init V0.0.8.

I also need to thorough-fully test the in-place initialization - ideally - on an MCU. To be happening next week.

Jul 07 '24 08:07 ivmarkov

@kedars @andy31415 Not ready for merging yet, ~~because the current version of pinned-init it depends on requires nightly. Hopefully we can lift this restriction from pinned-init soon~~ because I still need to test the changes end-to-end. But if you can look into the "mini-RFC" in the PR description.

Change is completely additive and incremental, but still important to understand and form an opinion on.

Jul 07 '24 09:07 ivmarkov

Not so important but attached two screenshots:

MatterStack from rs-matter-stack initialized using const fn - you can see 50KB in the data segment
MatterStack initialized with the newly introduced init infrastructure - same 50KB, but now in .bss (as the data being initialized starts its life as MaybeUninit and only when we in-place initialize it with .uninit().init_with(MatterStack::init(...)) it becomes real)! :)

Jul 07 '24 10:07 ivmarkov

@kedars @andreilitvin In case you are wondering why there is no activity on this PR:

The reason is - before marking it "ready for merge" I need to prototype the follow up PR, which would bear the fruits from this one (hopefully).

In a way, prototyping the subsequent PR would thus de-risk and prove the usefulness of the changes introduced by this one.

Aside from re-working how Fabrics and ACLs are initialized and built-up during commissioning (an exercise primarily aiming at memory savings - particularly stack memory) I'm also working on improving the TLV framework in two major aspects:

Enhancing FromTLV to support with - in addition to the existing from_tlv method - also with a new init_from_tlv which allows to in-place initialize large structures - ala C++. Which is what the current PR is all about. For example, this means we can in-place initialize the Fabric object from its TLV serialized buffer without materializing it on-stack first, and then moving it into the Vec of fabrics.
Reducing the size of our friend - TLVElement (the representation of partially-parsed TLV data) - from 40 bytes (currently, on x64) - to 16 bytes (on x64). (Similar reduction ratio for 32 bit MCU archs.) This one is interesting actually. While it might seem that I'm trying to cut the lawn with nail clippers, this is only at first sight. The problem is, we are full with TLV-from structures (IM ReadReq, SubscribeReq, WriteReq and quite a few in SC too - Cert - to name the worst one) - which are already full with partially-parsed TLV data in the form of TLVElement or TLVArray. So for those (which are created on the stack and moved around quite a bit and oh well, this is unavoidable, as this is their primary use case!) we might see a reduction of 2x if we can reduce TLVElement and TLVArray sizes. Hence why this exercise is important. Cert for example currently weights more than 400 bytes on x64. By removing the static Vec<T, N> instances in it and then by reducing the TLVElement sizes we can go down to something like 112 bytes on x64, which is quite an improvement.

Jul 25 '24 11:07 ivmarkov

I think this is ready for merge (and had grown up a bit after I did the modules' reshuffling in the utils namespace suggested by @andy31415 . Not that they were a bad idea, quite the contrary.)

The fruits of this efforts will be shown with subsequent PRs of course. For now, the new init-in-place machinery is mostly latent, but its time is soon to come.

Aug 29 '24 18:08 ivmarkov

I took the freedom to squash my changes into a single commit, or else I - for the life of me - couldn't rebase my forthcoming TLV-related changes on top.

Hopefully no big deal, as I doubt anybody would be reviewing the changes in this PR commit by commit (and it does not make sense).

Aug 30 '24 14:08 ivmarkov

@kedars This PR also contains the small changes from #202 and #203 so we can also just merge this one, and close the other two.

Sep 03 '24 07:09 ivmarkov

@kedars @andy31415 Can somebody spend some time reviewing this?

It is a precondition and a show-stopper for the following PRs down in my pipeline:

Fabrics & ACLs rewrite/improvements. Results in ~ 8K memory reduction (500 bytes per session, for our default of up to 16 sessions)
TLV rewrite/improvements. Yet to assess the memory savings, but should be significant, as the TLVElement is now just a fat pointer, and we now have TLV structures which are re-created completely on the fly. I.e. the Cert struct going down from 400 bytes to 16 bytes. Similar savings for a lot of structs in the IM layer too (ReadReq, WriteReq, InvokeReq etc.)

Sep 06 '24 07:09 ivmarkov

... Also a lot of temporary on-stack buffers with large sizes in the SC layer (800 bytes each) are now eliminated, but that depends a bit on the new TLV framework...

Sep 06 '24 07:09 ivmarkov

rs-matter rs-matter copied to clipboard

In place initialization of the `Matter` object, the buffers and subscriptions

What is the problem?

Status Quo

Why are we solving (again) an already solved problem then?

The More Important Reason why we are solving it again

Can we do better?

Can't we apply the const fn trick to Fabric and Session?

pinned-init background

Changes delta

Alternatives

Option 1: Do nothing

Option 2: &mut-borrow large heapless::Vec instances

Why is this PR still a draft?

rs-matter
rs-matter copied to clipboard

Can't we apply the `const fn` trick to `Fabric` and `Session`?

`pinned-init` background

Option 2: `&mut`-borrow large `heapless::Vec` instances