mogwai
mogwai copied to clipboard
Reduced string allocation
Would there be any interest in replacing a lot of the String
usages with SmolStr
instead, it would relieve a lot of memory pressure wins as a ton of strings (for example, attributes, styles, etc...) in mogwai(-dom) are less than 24 utf-8 bytes in length?
For note, on the wasm side it will only save memory for things about 10 in length (because 32-bits, still worth it I say), I'm more curious on saving it on the server when I SSR it for hydration purposes, generating the SSR is dominating the time in the servers route processor.
A quick test shows that though memory pressure is reduced, the extra conditional is adding enough overhead that offsets the performance boost from that for initial load, it only 'gains' for stored values over the longer time, needs more testing...
I'm definitely glad you're looking into it! As it stands mogwai performs worst in the memory domain. I welcome any improvements, so long as they don't hamper ergonomics.
Very unscientific, but using crates/mogwai-benches (not keyed, just normal, for maximum allocations happening) in this repo in the normal main
branch and in my string_overhaul_testing
(stot
below for brevity, using the smartstring library) there are these differences I was able to see, each run was done in a new incognito instance to keep things clear:
File Size:
- main: 301KB
- stot: 305KB
So it's about 4kb larger to add a more efficient string library.
Browser run times (ran multiple times for each in different orders, each of the given type were within ~1% of all the other runs of the same type so seemed consistent):
main:
stot:
The difference in times shows that stot is consistently slightly slower than main, but not by much. In my own project I was showing a greater amount of difference at ~2% slower for stot but I didn't have as much string manipulation as it seems this test has so I'm unsure...
Memory usage, as taken by a heap dump from within chrome after the mogwai-bench run has completed:
- main: 87.3MB
- stot: 84.6MB
So stot
does save a little bit of heap allocation amounts at the end of this benchmark, not a huge amount though but a few megs is still not trivial.
Performing records within chrome of the allocations that happen shows that all memory that wasm requests is allocated very quickly and then reused for the rest of the tests, so the rest is just using the wasm internal memory allocator, which is just the stock allocator in this library it seems. Let's try adding wee_alloc
, which is common to use in wasm due to smaller filesize in exchange for slightly slower allocation speed and more inefficient allocations:
-
main file size shrunk to 295KB, with the heap snapshot now being 189MB, result of the mogwai-bench is:
-
stat main file size shrunk to 299KB, with the heap snapshot now being 192MB (unsure why bigger, allocation page ordering perhaps?), result of the mogwai-bench is:
So yeah, as stated, unsure if it's really worth it, it is a measurable change but unsure if it's worth the added complexity, even if it would help with the allocation overhead in SSR that I'm measuring in my app.
I wonder if just using a Cow<'static, str>
would be sufficient to save a lot of this as 'most' of the strings are indeed just 'static str
's... If I get time today I'll try to make a new branch to try that out.
Thank you for the benchmarking! It's too bad it isn't an obvious easy win.
I wonder if just using a Cow<'static, str>...
That sounds like a great idea. I'm looking forward to any findings :)
Didn't get time yesterday, busy at work, just looking into it this morning and I notice that non-static str's are able to be casted in (they are converted to Strings); since rust doesn't have specialization then that conversion would have to force 'static for strings, if non-static then they'd have to .to_string or whatever on it instead, so that would be a breaking change little though it may be... (which I personally think is a good change, better then hiding an allocation after all). I'm not even sure if that's even really used though, it doesn't seem to be in my code, I use either String or &'static str for mine...
Okay, a basic conversion to using Cow<'static, str>
, surprisingly didn't have to change any tests or anything like that. The run of the mogwai-benches:
A heap snapshot took 84.2MB in chrome. Generated filesize is 304kb, so a little bit larger than just the basic mainline, but I don't think that's because of the Cow but rather I added some more From's for a few different things(these need to be cleaned up, lol).
Hmm, well no I take that back, I had to change one doctest, I had to remove a .into()
, and that was all, so I guess it simplified it more? (I also changed a let (mut tx, rx) = broadcast::bounded(1.try_into().unwrap());
in a doctest to let (mut tx, rx) = broadcast::bounded(1);
because it was bugging me, lol, there are more of those that need to be changed too though...) There were some other changes too because the cargo formatter changed some lines, unsure why they weren't formatted to begin with (different rust formatter version?).
The benchmark does do a lot of to_string
ing though, fixing some of that would probably help it be better on that as well, but not changing it up yet.
So yeah, again it seems within error bounds for speed, but it does reduce memory usage a bit, still unsure if its worth it though.
It did reduce my SSR generation time by ~8-9% though, which is worth it for me anyway for sure. ^.^
I only added Cow in one 'area' (the view), haven't looked into elsewhere yet so might have more speedups...
EDIT: I wonder if anyone's made a combined Cow<'static, ...> and smartstring/smolstr-like thing yet...
It definitely seems that something like this is good to put on the roadmap.
For reference with my SSR benchmarks here, here is a test with leptos of a trivial counter example (for simple) and that same counter put in a single div 10000 times (for many), and timed the various parts, these are the times I got (using the cow version of mogwai here, the mainline version is basically the same but about 8-9% slower):
simple_counter/mogwai time: [6.4647 µs 6.4842 µs 6.5034 µs]
simple_counter/leptos time: [7.1320 µs 7.1526 µs 7.1758 µs]
many_counter/mogwai time: [123.57 ms 124.15 ms 124.77 ms]
many_counter/leptos time: [77.771 ms 78.192 ms 78.643 ms]
parts_many-mogwai/view time: [31.956 ms 32.027 ms 32.099 ms]
parts_many-mogwai/generation
time: [58.429 ms 58.926 ms 59.566 ms]
parts_many-mogwai/render
time: [29.603 ms 29.684 ms 29.768 ms]
parts_many-leptos/view time: [26.477 ms 26.657 ms 26.841 ms]
parts_many-leptos/render
time: [49.994 ms 50.909 ms 52.136 ms]
This is the mogwai part of timing the 'parts':
{
let mut g = c.benchmark_group("parts_many-mogwai");
g.bench_function("view", |b| {
b.to_async(&async_runtime)
.iter(|| async move { mogwai_many_counter_view() })
});
g.bench_function("generation", |b| {
b.to_async(&async_runtime).iter_custom(|iters| async move {
let mut elapsed = Duration::default();
for _ in 0..iters {
let view = mogwai_many_counter_view();
let start = Instant::now();
black_box(SsrDom::try_from(view).unwrap());
elapsed += start.elapsed();
}
elapsed
})
});
g.bench_function("render", |b| {
b.to_async(&async_runtime).iter_custom(|iters| async move {
let view = mogwai_many_counter_view();
let ssr = SsrDom::try_from(view).unwrap();
let start = Instant::now();
for _ in 0..iters {
black_box(ssr.html_string().await);
}
start.elapsed()
})
});
}
And this is the mogwai simple counter:
pub fn simple_counter() -> ViewBuilder {
let output_clicked = Output::<()>::default();
let mut c = 0u32;
html! {
<div
style="float:left;padding:1em;border-radius:.5em;border:1px solid #ddd;background:#f7f7f7;cursor:pointer"
on:click=output_clicked.sink().contra_map(|_: JsDomEvent| ())
>
<p>
{(
"Hello world!",
output_clicked.stream().map(move |()| {
c += 1;
match c {
0 => CowString::from("Hello world!"),
1 => CowString::from("Caught 1 click, click again 😀"),
clicks => format!("Caught {} clicks", clicks).into(),
}
})
)}
</p>
</div>
}
}
Kind of wish a view could be rendered via a reference instead of it taking ownership of it in mogwai, would be convenient to static generate a set of them for faster rendering and manual patching for realtime updates without needing to rerun the view each time.
@OvermindDL1 - I'm back on this - well the rendering from a reference portion. What type exactly are you referring to? Because SsrDom
's html_string
method takes &self
.