ruma icon indicating copy to clipboard operation
ruma copied to clipboard

Make state_res::resolve allocate as little memory by clones as possible

Open ShadowJonathan opened this issue 4 years ago • 11 comments

I think it's possible to reduce the total amount of allocations that resolve has to do by a significant amount, by making sure that most intermediary values instead borrow from values in its own function body (such as seperate) to shuffle and attest values around.

A benefit to this would be that state_res::resolve both consumes not much more memory than the caller gives it, and it is faster as a result.

ShadowJonathan avatar Nov 25 '21 16:11 ShadowJonathan

I don't think that there is much opportunity for improvement in this area anymore. Can we close this?

jplatte avatar Feb 03 '22 00:02 jplatte

I believe https://github.com/ruma/ruma/pull/767#issuecomment-1012198072 is still relevant, I have it in my todo list to revisit that.

ShadowJonathan avatar Feb 03 '22 09:02 ShadowJonathan

I've maybe found some easy optimizations;

  • Change StateMap's String key to Cow<str>, this allows the caller to "compress" the state event key.
    • However, playing around a bit in playground, it increases the size of the actual SEK ((RoomEventType, String)) by one word (8 bytes on 64-bit platforms)
  • Change auth_chain_sets to Vec<Vec<E::Id>>, this loses the dedup guarantee, only get_auth_chain_diff uses it, and only to count which event IDs appear in some sets, and not in others (by comparing count < num_sets)

However, these are not trivial, and I kinda need to know how sound they are, @jplatte @DevinR528 thoughts?

ShadowJonathan avatar Apr 04 '22 09:04 ShadowJonathan

  • Change StateMap's String key to Cow<str>, this allows the caller to "compress" the state event key.
    • However, playing around a bit in playground, it increases the size of the actual SEK ((RoomEventType, String)) by one word (8 bytes on 64-bit platforms)

We can use a custom type instead of Cow which uses Box<str> over String, which would be three words like String. We could even go down to two words by copying the beef crate's unsafe trickery, but that should likely be a separate change.

  • Change auth_chain_sets to Vec<Vec<E::Id>>, this loses the dedup guarantee, only get_auth_chain_diff uses it, and only to count which event IDs appear in some sets, and not in others (by comparing count < num_sets)

Not sure about this one.

jplatte avatar Apr 04 '22 10:04 jplatte

We could even go down to two words by copying the beef crate's unsafe trickery, but that should likely be a separate change.

I saw beef while looking at Cow's size, but I didn't want to suggest it because it'd be a separate type.

Regardless, I think we shouldn't copy, but maybe depend on it if we want to use less space.

ShadowJonathan avatar Apr 04 '22 10:04 ShadowJonathan

We can use a custom type instead of Cow which uses Box<str> over String

That's the wrong optimization i want to make here, I want to optimize the heap-allocated strings, allowing them to be borrowed from somewhere else, allowing the caller to dedup them and reduce heap pressure by avoiding clones before calling resolve

ShadowJonathan avatar Apr 04 '22 10:04 ShadowJonathan

I checked again and all the optimizations that crate does are based on a capacity being present. They're thus not applicable here. Still we could use

enum MaybeOwnedStr<'a> {
    Borrowed(&'a str),
    Owned(Box<str>),
}

which is smaller than Cow<str> and the same size as String.

jplatte avatar Apr 04 '22 10:04 jplatte

Though now that I think about it, since references and the Box pointer can't be null, this should still be possible as a two-words thing where the last bit determines whether it's owned or borrowed.

I might build a separate crate for that 🙂

jplatte avatar Apr 04 '22 10:04 jplatte

FTR the reason why we cant just use &str and friends is because iterative_auth_check wants to push owned strings into StateMaps, so that rules out only borrowing.

ShadowJonathan avatar Apr 04 '22 10:04 ShadowJonathan

  • Change auth_chain_sets to Vec<Vec<E::Id>>, this loses the dedup guarantee, only get_auth_chain_diff uses it, and only to count which event IDs appear in some sets, and not in others (by comparing count < num_sets)

This would be fine as far as the algorithm goes. Is the problem that conduit has to re collect() into a HashSet for this?

DevinR528 avatar Apr 04 '22 14:04 DevinR528

Not neccecarily, but that HashSet has larger heap pressure than Vec would have, thus me trying to tackle it.

The main goal here is to reduce leaf dedup, and other structures along the way, while having minimal performance loss.

Essentially; Conduit would not have to clone a bunch of structures before passing it into resolve.

ShadowJonathan avatar Apr 04 '22 14:04 ShadowJonathan