rs-matter
rs-matter copied to clipboard
In place initialization of the `Matter` object, the buffers and subscriptions
(UPDATED: The section "Why is this PR still a draft" with recent progress.)
What follows below is a "mini-RFC" of sorts, justifying why we need this change, how it is implemented, next steps and so on.
What is the problem?
Initialize the Matter object, as well as other large objects (like IM handler's PooledBuffers) by avoiding the possibility of stack blowups.
Status Quo
This problem is already solved actually!
The reason why we can initialize the Matter structure... (and other structures, but from now on I'll talk only about Matter as everything applicable to Matter is also valid for the other large non-Future structures we are having, like PooledBuffers and Subscriptions)... so the reason why we can initialize it without risking stack memory blow-ups is because it has a const constructor. In other words, Matter::new(...) can be called from a const context.
How does that help?
- You can simply use
static_cell'sConstStaticCelland simply do:
static MATTER: ConstStaticCell<Matter> = ConstStaticCell::new(Matter::new(...));
... what the above would do is that it will reserve a place for the Matter object in the data section of the executable, and the linker startup script will initialize the Matter structure (and other structures in the data section) - upon program startup - with its value. This is possible, because Matter::new is const and so is ConstStaticCell::new. Therefore, the object layout of ConstStaticCell<Matter> is generated at compile time, saved (in the form of a byte sequence) in the linker data elf section, and then upon program startup, the whole data section is copied from the elf (or from flash) into RAM by the linker startup using a simple memcpy (or a memcpy-like) routine.
- You can also do all sorts of equilibristics at runtime, and the compiler usually optimizes it. Like (excuse any typos):
const MATTER: Matter = Matter::new(...);
let boxed_matter = unsafe {
let boxed_matter: Box<MaybeUninit<Matter>> = Box::new_uninit();
// This is optimized by the compiler to do a similar `memcpy` as to when the `data` section is initialized by the linker,
// because we are copying a `const` value into the `*mut Matter` raw pointer.
// Be careful to inline this code and not hide the `MATTER` const behind a function though, as this optimization might
// not trigger then.
boxed_matter.as_mut_ptr().write(&MATTER);
boxed_matter.assume_init()
}
// `boxed_matter` is now initialized, without any stack blow-ups
...
Why are we solving (again) an already solved problem then?
Using const-initializers comes with one disadvantage: flash size. Since the default value of the Matter object is not all 0s (or uninitialized data) - and if you can trust me - it is very, very difficult to come up with a default Matter object state which is all 0s and/or MaybeUninit (*) - ultimately - the Matter object is placed in the data section instead of in bss, which means the initial layout of the object has to be burned in flash.
Depending on the size of all buffers, number of sessions, number of supported fabrics and so on, the total size of the Matter stack is anywhere between 30KB to 70KB.
Now, this is nothing for embedded Linux scenarios, and is probably not that much for Espressif MCUs which have 4MB flash. But it might be something for other MCUs which generally have less flash size.
(*) Coming up with "all-zeroes" default object state in Rust is very, very difficult.
Stuff which is a useful default and which is "all-zeroes":
falseinbool- 0 in
u8,u32,u16etc NoneinOption<NonZeroU8>,Option<NonZeroU16>, etc
Stuff which is a useful default and which is NOT "all-zeroes":
""in&str(&stris a fat pointer and is NOT modeled - like in C++ - with(nullptr, 0), but with(alignof(&str), 0)i.e. it ends up having a binary value of0x0000000100000000(on 32bit archs)b""(empty slice) in&[u8]- ditto as for&strNoneforOption<Foo>whereOption<Foo>is not special-cased by the compiler as in e.g.Option<NonZeroU8>. Especially withOption<Foo>whereFoois an enum, the compiler often plays trick where it modelsNoneas an extra value in theFoo's enum tag, and this extra value in the tag is almost never 0
The More Important Reason why we are solving it again
Putting aside flash size concerns, we have an existing, unsolved case of stack blow-ups (and excessive stack usage at runtime):
We are very often allocating large objects on-stack and then moving them in their final destination (which is usually a heapless::Vec or an array inside the Matter object).
Cases where we do this:
Fabric(1.7KB) - first allocated on-stack then moved in the vector byvec.push(fabric)Session(~ 1KB currently, can be reduced) - dittoACL- ditto- A lot of other places, like temporary arrays of ~ 1KB inside the Secure Channel implementation
But let's concentrate on Fabric and Session for now.
In order to avoid their temporary stack allocation (and subsequent move into the vec via push), we need to somehow implement a "placement-new" for heapless::Vec, and somehow initialize the Fabric or Session in-place, directly in the memory of the heapless::Vec instance. Something like:
vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
let slot_ptr = slot.as_mut_ptr();
unsafe {
addr_of_mut!((*slot_ptr).node_id).write(node_id);
addr_of_mut!((*slot_ptr).fabric_id).write(fabric_id);
addr_of_mut!((*slot_ptr).fabric_idx).write(fabric_idx);
// ... and so on
}
});
Two problems with the above:
- We need a custom
heapless::Vec, as the upstream one does not have (yet) apush_in_placemethod - Using
push_in_placerequires a lot of care by the caller, as it is full of unsafe code. And very, very verbose!
Can we do better?
Yes. With pinned-init.
(
BTW: We'll put aside the pinned- aspect of that crate. All our objects are Unpin and fortunately, pinned-init supports unpinned objects just fine, contrary to its name - without any exposure of the user to the notion of pinning.
)
So what does pinned-init do? Grossly oversimplifying, but it allows us to turn:
vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
let slot_ptr = slot.as_mut_ptr();
unsafe {
addr_of_mut!((*slot_ptr).node_id).write(node_id);
addr_of_mut!((*slot_ptr).node_id).write(fabric_id);
addr_of_mut!((*slot_ptr).node_id).write(fabric_idx);
// ... and so on
}
});
into:
vec.push_in_place(Fabric::init());
where Fabric::init() is almost like Fabric::new() except with a slightly different signature and function body:
impl Fabric {
fn init(node_id: u64, fabric_id: u64, fab_idx: NonZeroU8) -> impl Init<Self> {
init!(Self {
node_id,
fabric_id,
fab_idx,
icac <- rs_matter::utils::vec::Vec::init(), // Notice the weird `<-` syntax here and how we compose our `init` with `Vec`s `init`!
})
}
The above syntax - thanks to the init! macro provided by pinned-init allows us to use safe, readable, composable syntax to in-place initialize structures recursively! Or rather and more correctly - to express an in-place initializer/constructor that does it.
Almost like in C++!
The similarity with regular new() -> Self is on purpose of course.
Can't we apply the const fn trick to Fabric and Session?
Yes, we can.
Assuming we implement const fn Fabric::new() -> Self as a parameter-less initial state of the Fabric object, we can try the following:
const INITIAL_FABRIC: Fabric = Fabric::new();
fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?;
let fabric = fabric_mgr.fabrics_vec.last_mut().unwrap();
// Now modify `fabric` members as we see fit
fabric.node_id = ...
BUT: The problem is fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?; Even though INITAL_FABRIC is a const, we have no guarantee, that the compiler will (a) inline the call to fabrics_vec.push(...) and (b) it will use a memcpy to initialize the new fabric at the end of the vector with the INITAL_FABRIC const. It might or might not do that, depending on the optimization settings.
In contrast pinned-init always initializes in place, even in opt=0.
pinned-init background
This crate (or rather, a copy of it) is used in the RfL (Rust for Linux) project, and is therefore merged in the Linux kernel mainline. So despite the low-star rating on GitHub, I think the approach of the crate is very solid and credible.
Changes delta
The changes suggested here are incremental, and most importantly - additive.
- All
const fnconstructors are preserved, so folks can continue usingMatter::new(...)in aconstcontext Matter(and all other structures recursively downwards) just get an extrainitmethod next to theirnewconstructor functions, which is an almost verbatim copy of the existingnew, yet with the large members of the concrete struct using the<-syntax of thepinned_init::init!macro which delegates to the placement-newinitmethods of the inner structures
Not so ideal stuff:
- I had to copy
RefCellfrom Rustcoreintors_matter::utils::cell::RefCellas thecoreRefCelldoes not have aninitplacement-new method. However this is temporary. We'll have to anyway get rid ofRefCellin future in favor of using a real mutex, which is either no-op for single-threaded scenarios (the default), or a real blocking one. This future mutex or the beginnings of it is available under the newblmutexmodule which is introduced by this PR as well - I had to copy
heapless::Vecasrs_matter::utils::vec::Vecand extend it to (a) have aninitin-place constructor and (b) to have thepush_inplacemethod discussed above. Maybe in future, if thepinned-initcrate gains traction in the Rust embedded echosystem (and Embassy in particular) we might be able to merge theVecchanges upstream inheapless(and merge ourblmutexchanges upstream too). But I find both of these not such a big issue.
Alternatives
Option 1: Do nothing
We can continue relying on const fn for the initialization of the Matter object (and pay in increased flash size usage). For Fabric and Session, we can either do the const trick described above (and rely on compiler opt settings to do their job), or we can still fork heapless::Vec and introduce push_in_place but without the convenience of pinned_init::init! we'll have to use a lot of unsafe to in-place initialize the members of the Vec (Fabric and Session and in future possinly others).
Option 2: &mut-borrow large heapless::Vec instances
(This is what I tried originally.)
We can make the big heapless::Vec instances used in Matter no longer owned by Matter, but rather - borrowed in Matter by &mut references.
I.e. FabricMgr would become FabricMgr<'v> because it would contain a &'v mut heapless::Vec<Fabric> rather than owning that vector (or array) as it is now. Consequently, Matter<'a> would become Matter<'a, 'v>, Exchange<'a> would become Exchange<'a, 'v> and so on down the line. Why the existing covariant 'a lifetime cannot be merged with the new 'v lifetime is explained below.
Advantages:
- Simple and obvious change. Solves the "flash size" problem because these large, initially-empty Vecs are the primary contributors to the flash size increase as they constitute the majority of the
Matterobject "data", even if they initially contain... nothing
Disadvantages:
Matterbecomes even less ergonomic: now the user has to separately inject another set of buffers besidesPooledBuffers. I.e. this change increases complexity. With the alternative suggested here, we can in fact mergePooledBuffersback intoMatterat some point in future, which would reduce complexity- Even worse: the
'vlifetime from above for the&'v mut heapless::Vecexternal buffers we'll be using is invariant (as it is for a&mutref). So we can't merge it with the existing, nice covariant'alifetime the Matter object has. No matter what we do - I even tried with object pools and whatnot, but ultimately, these are either&mutorRefCell<&Pool>orSomeInteriorMutabilityConstructLikeMutex<&Pool>non-mutable references, and because cells with interior mutability always result in invariant lifetimes, like&mut, we always end up with'vbeing invariant and thus unmergable with'a.- We can of course hard-code both or either of
'aor'vto be'static, but this way we sacrifice the flexibility ofMatteron platform like Embedded Linux, where none of the problems discussed here are applicable (stacks on Linux are 2MB by default) and where the user might want to allocateMatterandBasicInfoConfigandDevAttDataFetcherand its mDNS impl and even the buffers on-stack. That would be impossible with'static.
- We can of course hard-code both or either of
Why is this PR still a draft?
~~pinned-init is currently still utilizing core::alloc::AllocError which is not in Rust stable yet. (and for no_std this is the only nightly feature it needs).~~
~~It seems the author is open to changing the crate in a way where it no longer unconditionally depends on core::alloc::AllocError. This error type is anyway only used in the InPlaceInit trait, which is a decorator of Arc, Rc and Box (i.e. the alloc module) which we don't use/need, so perhaps we can just put - in pinned-init - the definition of InPlaceInit (and the usage of core::alloc::AllocError) behind the alloc feature flag?~~
UPDATE: Problem solved. The relevant PR addressing ^^^ got merged yesterday and we are now using it via the newly released pinned-init V0.0.8.
I also need to thorough-fully test the in-place initialization - ideally - on an MCU. To be happening next week.
@kedars @andy31415 Not ready for merging yet, ~~because the current version of pinned-init it depends on requires nightly. Hopefully we can lift this restriction from pinned-init soon~~ because I still need to test the changes end-to-end. But if you can look into the "mini-RFC" in the PR description.
Change is completely additive and incremental, but still important to understand and form an opinion on.
Not so important but attached two screenshots:
-
MatterStackfromrs-matter-stackinitialized usingconst fn- you can see 50KB in thedatasegment -
MatterStackinitialized with the newly introducedinitinfrastructure - same 50KB, but now in.bss(as the data being initialized starts its life asMaybeUninitand only when we in-place initialize it with.uninit().init_with(MatterStack::init(...))it becomes real)! :)
@kedars @andreilitvin In case you are wondering why there is no activity on this PR:
The reason is - before marking it "ready for merge" I need to prototype the follow up PR, which would bear the fruits from this one (hopefully).
In a way, prototyping the subsequent PR would thus de-risk and prove the usefulness of the changes introduced by this one.
Aside from re-working how Fabrics and ACLs are initialized and built-up during commissioning (an exercise primarily aiming at memory savings - particularly stack memory) I'm also working on improving the TLV framework in two major aspects:
- Enhancing
FromTLVto support with - in addition to the existingfrom_tlvmethod - also with a newinit_from_tlvwhich allows to in-place initialize large structures - ala C++. Which is what the current PR is all about. For example, this means we can in-place initialize theFabricobject from its TLV serialized buffer without materializing it on-stack first, and then moving it into theVecof fabrics. - Reducing the size of our friend -
TLVElement(the representation of partially-parsed TLV data) - from 40 bytes (currently, on x64) - to 16 bytes (on x64). (Similar reduction ratio for 32 bit MCU archs.) This one is interesting actually. While it might seem that I'm trying to cut the lawn with nail clippers, this is only at first sight. The problem is, we are full with TLV-from structures (IMReadReq,SubscribeReq,WriteReqand quite a few in SC too -Cert- to name the worst one) - which are already full with partially-parsed TLV data in the form ofTLVElementorTLVArray. So for those (which are created on the stack and moved around quite a bit and oh well, this is unavoidable, as this is their primary use case!) we might see a reduction of 2x if we can reduceTLVElementandTLVArraysizes. Hence why this exercise is important.Certfor example currently weights more than 400 bytes on x64. By removing the staticVec<T, N>instances in it and then by reducing theTLVElementsizes we can go down to something like 112 bytes on x64, which is quite an improvement.
I think this is ready for merge (and had grown up a bit after I did the modules' reshuffling in the utils namespace suggested by @andy31415 . Not that they were a bad idea, quite the contrary.)
The fruits of this efforts will be shown with subsequent PRs of course. For now, the new init-in-place machinery is mostly latent, but its time is soon to come.
I took the freedom to squash my changes into a single commit, or else I - for the life of me - couldn't rebase my forthcoming TLV-related changes on top.
Hopefully no big deal, as I doubt anybody would be reviewing the changes in this PR commit by commit (and it does not make sense).
@kedars This PR also contains the small changes from #202 and #203 so we can also just merge this one, and close the other two.
@kedars @andy31415 Can somebody spend some time reviewing this?
It is a precondition and a show-stopper for the following PRs down in my pipeline:
- Fabrics & ACLs rewrite/improvements. Results in ~ 8K memory reduction (500 bytes per session, for our default of up to 16 sessions)
- TLV rewrite/improvements. Yet to assess the memory savings, but should be significant, as the
TLVElementis now just a fat pointer, and we now have TLV structures which are re-created completely on the fly. I.e. theCertstruct going down from 400 bytes to 16 bytes. Similar savings for a lot of structs in the IM layer too (ReadReq,WriteReq,InvokeReqetc.)
... Also a lot of temporary on-stack buffers with large sizes in the SC layer (800 bytes each) are now eliminated, but that depends a bit on the new TLV framework...