Make state_res::resolve allocate as little memory by clones as possible
I think it's possible to reduce the total amount of allocations that resolve has to do by a significant amount, by making sure that most intermediary values instead borrow from values in its own function body (such as seperate) to shuffle and attest values around.
A benefit to this would be that state_res::resolve both consumes not much more memory than the caller gives it, and it is faster as a result.
I don't think that there is much opportunity for improvement in this area anymore. Can we close this?
I believe https://github.com/ruma/ruma/pull/767#issuecomment-1012198072 is still relevant, I have it in my todo list to revisit that.
I've maybe found some easy optimizations;
- Change
StateMap'sStringkey toCow<str>, this allows the caller to "compress" the state event key.- However, playing around a bit in playground, it increases the size of the actual SEK (
(RoomEventType, String)) by one word (8 bytes on 64-bit platforms)
- However, playing around a bit in playground, it increases the size of the actual SEK (
- Change
auth_chain_setstoVec<Vec<E::Id>>, this loses the dedup guarantee, onlyget_auth_chain_diffuses it, and only to count which event IDs appear in some sets, and not in others (by comparingcount < num_sets)
However, these are not trivial, and I kinda need to know how sound they are, @jplatte @DevinR528 thoughts?
- Change
StateMap'sStringkey toCow<str>, this allows the caller to "compress" the state event key.
- However, playing around a bit in playground, it increases the size of the actual SEK (
(RoomEventType, String)) by one word (8 bytes on 64-bit platforms)
We can use a custom type instead of Cow which uses Box<str> over String, which would be three words like String. We could even go down to two words by copying the beef crate's unsafe trickery, but that should likely be a separate change.
- Change
auth_chain_setstoVec<Vec<E::Id>>, this loses the dedup guarantee, onlyget_auth_chain_diffuses it, and only to count which event IDs appear in some sets, and not in others (by comparingcount < num_sets)
Not sure about this one.
We could even go down to two words by copying the beef crate's unsafe trickery, but that should likely be a separate change.
I saw beef while looking at Cow's size, but I didn't want to suggest it because it'd be a separate type.
Regardless, I think we shouldn't copy, but maybe depend on it if we want to use less space.
We can use a custom type instead of
Cowwhich usesBox<str>overString
That's the wrong optimization i want to make here, I want to optimize the heap-allocated strings, allowing them to be borrowed from somewhere else, allowing the caller to dedup them and reduce heap pressure by avoiding clones before calling resolve
I checked again and all the optimizations that crate does are based on a capacity being present. They're thus not applicable here. Still we could use
enum MaybeOwnedStr<'a> {
Borrowed(&'a str),
Owned(Box<str>),
}
which is smaller than Cow<str> and the same size as String.
Though now that I think about it, since references and the Box pointer can't be null, this should still be possible as a two-words thing where the last bit determines whether it's owned or borrowed.
I might build a separate crate for that 🙂
FTR the reason why we cant just use &str and friends is because iterative_auth_check wants to push owned strings into StateMaps, so that rules out only borrowing.
- Change auth_chain_sets to Vec<Vec<E::Id>>, this loses the dedup guarantee, only get_auth_chain_diff uses it, and only to count which event IDs appear in some sets, and not in others (by comparing count < num_sets)
This would be fine as far as the algorithm goes. Is the problem that conduit has to re collect() into a HashSet for this?
Not neccecarily, but that HashSet has larger heap pressure than Vec would have, thus me trying to tackle it.
The main goal here is to reduce leaf dedup, and other structures along the way, while having minimal performance loss.
Essentially; Conduit would not have to clone a bunch of structures before passing it into resolve.