mogwai icon indicating copy to clipboard operation
mogwai copied to clipboard

Reduced string allocation

Open OvermindDL1 opened this issue 1 year ago • 13 comments

Would there be any interest in replacing a lot of the String usages with SmolStr instead, it would relieve a lot of memory pressure wins as a ton of strings (for example, attributes, styles, etc...) in mogwai(-dom) are less than 24 utf-8 bytes in length?

OvermindDL1 avatar May 18 '23 22:05 OvermindDL1

For note, on the wasm side it will only save memory for things about 10 in length (because 32-bits, still worth it I say), I'm more curious on saving it on the server when I SSR it for hydration purposes, generating the SSR is dominating the time in the servers route processor.

OvermindDL1 avatar May 18 '23 22:05 OvermindDL1

A quick test shows that though memory pressure is reduced, the extra conditional is adding enough overhead that offsets the performance boost from that for initial load, it only 'gains' for stored values over the longer time, needs more testing...

OvermindDL1 avatar May 19 '23 23:05 OvermindDL1

I'm definitely glad you're looking into it! As it stands mogwai performs worst in the memory domain. I welcome any improvements, so long as they don't hamper ergonomics.

schell avatar May 22 '23 02:05 schell

Very unscientific, but using crates/mogwai-benches (not keyed, just normal, for maximum allocations happening) in this repo in the normal main branch and in my string_overhaul_testing (stot below for brevity, using the smartstring library) there are these differences I was able to see, each run was done in a new incognito instance to keep things clear:


File Size:

  • main: 301KB
  • stot: 305KB

So it's about 4kb larger to add a more efficient string library.


Browser run times (ran multiple times for each in different orders, each of the given type were within ~1% of all the other runs of the same type so seemed consistent):

main: image

stot: image

The difference in times shows that stot is consistently slightly slower than main, but not by much. In my own project I was showing a greater amount of difference at ~2% slower for stot but I didn't have as much string manipulation as it seems this test has so I'm unsure...


Memory usage, as taken by a heap dump from within chrome after the mogwai-bench run has completed:

  • main: 87.3MB
  • stot: 84.6MB

So stot does save a little bit of heap allocation amounts at the end of this benchmark, not a huge amount though but a few megs is still not trivial.


Performing records within chrome of the allocations that happen shows that all memory that wasm requests is allocated very quickly and then reused for the rest of the tests, so the rest is just using the wasm internal memory allocator, which is just the stock allocator in this library it seems. Let's try adding wee_alloc, which is common to use in wasm due to smaller filesize in exchange for slightly slower allocation speed and more inefficient allocations:

  • main file size shrunk to 295KB, with the heap snapshot now being 189MB, result of the mogwai-bench is: image

  • stat main file size shrunk to 299KB, with the heap snapshot now being 192MB (unsure why bigger, allocation page ordering perhaps?), result of the mogwai-bench is: image


So yeah, as stated, unsure if it's really worth it, it is a measurable change but unsure if it's worth the added complexity, even if it would help with the allocation overhead in SSR that I'm measuring in my app.

OvermindDL1 avatar May 22 '23 15:05 OvermindDL1

I wonder if just using a Cow<'static, str> would be sufficient to save a lot of this as 'most' of the strings are indeed just 'static str's... If I get time today I'll try to make a new branch to try that out.

OvermindDL1 avatar May 22 '23 16:05 OvermindDL1

Thank you for the benchmarking! It's too bad it isn't an obvious easy win.

I wonder if just using a Cow<'static, str>...

That sounds like a great idea. I'm looking forward to any findings :)

schell avatar May 23 '23 02:05 schell

Didn't get time yesterday, busy at work, just looking into it this morning and I notice that non-static str's are able to be casted in (they are converted to Strings); since rust doesn't have specialization then that conversion would have to force 'static for strings, if non-static then they'd have to .to_string or whatever on it instead, so that would be a breaking change little though it may be... (which I personally think is a good change, better then hiding an allocation after all). I'm not even sure if that's even really used though, it doesn't seem to be in my code, I use either String or &'static str for mine...

OvermindDL1 avatar May 23 '23 15:05 OvermindDL1

Okay, a basic conversion to using Cow<'static, str>, surprisingly didn't have to change any tests or anything like that. The run of the mogwai-benches: image

A heap snapshot took 84.2MB in chrome. Generated filesize is 304kb, so a little bit larger than just the basic mainline, but I don't think that's because of the Cow but rather I added some more From's for a few different things(these need to be cleaned up, lol).

Hmm, well no I take that back, I had to change one doctest, I had to remove a .into(), and that was all, so I guess it simplified it more? (I also changed a let (mut tx, rx) = broadcast::bounded(1.try_into().unwrap()); in a doctest to let (mut tx, rx) = broadcast::bounded(1); because it was bugging me, lol, there are more of those that need to be changed too though...) There were some other changes too because the cargo formatter changed some lines, unsure why they weren't formatted to begin with (different rust formatter version?).

The benchmark does do a lot of to_stringing though, fixing some of that would probably help it be better on that as well, but not changing it up yet.

OvermindDL1 avatar May 23 '23 21:05 OvermindDL1

So yeah, again it seems within error bounds for speed, but it does reduce memory usage a bit, still unsure if its worth it though.

It did reduce my SSR generation time by ~8-9% though, which is worth it for me anyway for sure. ^.^

I only added Cow in one 'area' (the view), haven't looked into elsewhere yet so might have more speedups...

EDIT: I wonder if anyone's made a combined Cow<'static, ...> and smartstring/smolstr-like thing yet...

OvermindDL1 avatar May 23 '23 21:05 OvermindDL1

It definitely seems that something like this is good to put on the roadmap.

schell avatar May 24 '23 04:05 schell

For reference with my SSR benchmarks here, here is a test with leptos of a trivial counter example (for simple) and that same counter put in a single div 10000 times (for many), and timed the various parts, these are the times I got (using the cow version of mogwai here, the mainline version is basically the same but about 8-9% slower):

simple_counter/mogwai   time:   [6.4647 µs 6.4842 µs 6.5034 µs]
simple_counter/leptos   time:   [7.1320 µs 7.1526 µs 7.1758 µs]
many_counter/mogwai     time:   [123.57 ms 124.15 ms 124.77 ms]
many_counter/leptos     time:   [77.771 ms 78.192 ms 78.643 ms]

parts_many-mogwai/view  time:   [31.956 ms 32.027 ms 32.099 ms]
parts_many-mogwai/generation
                        time:   [58.429 ms 58.926 ms 59.566 ms]
parts_many-mogwai/render
                        time:   [29.603 ms 29.684 ms 29.768 ms]
parts_many-leptos/view  time:   [26.477 ms 26.657 ms 26.841 ms]
parts_many-leptos/render
                        time:   [49.994 ms 50.909 ms 52.136 ms]

This is the mogwai part of timing the 'parts':

	{
		let mut g = c.benchmark_group("parts_many-mogwai");
		g.bench_function("view", |b| {
			b.to_async(&async_runtime)
				.iter(|| async move { mogwai_many_counter_view() })
		});
		g.bench_function("generation", |b| {
			b.to_async(&async_runtime).iter_custom(|iters| async move {
				let mut elapsed = Duration::default();
				for _ in 0..iters {
					let view = mogwai_many_counter_view();
					let start = Instant::now();
					black_box(SsrDom::try_from(view).unwrap());
					elapsed += start.elapsed();
				}
				elapsed
			})
		});
		g.bench_function("render", |b| {
			b.to_async(&async_runtime).iter_custom(|iters| async move {
				let view = mogwai_many_counter_view();
				let ssr = SsrDom::try_from(view).unwrap();
				let start = Instant::now();
				for _ in 0..iters {
					black_box(ssr.html_string().await);
				}
				start.elapsed()
			})
		});
	}

And this is the mogwai simple counter:

pub fn simple_counter() -> ViewBuilder {
	let output_clicked = Output::<()>::default();
	let mut c = 0u32;

	html! {
		<div
			style="float:left;padding:1em;border-radius:.5em;border:1px solid #ddd;background:#f7f7f7;cursor:pointer"
			on:click=output_clicked.sink().contra_map(|_: JsDomEvent| ())
		>
			<p>
				{(
					"Hello world!",
					output_clicked.stream().map(move |()| {
						c += 1;
						match c {
							0 => CowString::from("Hello world!"),
							1 => CowString::from("Caught 1 click, click again 😀"),
							clicks => format!("Caught {} clicks", clicks).into(),
						}
					})
				)}
			</p>
		</div>
	}
}

OvermindDL1 avatar May 24 '23 16:05 OvermindDL1

Kind of wish a view could be rendered via a reference instead of it taking ownership of it in mogwai, would be convenient to static generate a set of them for faster rendering and manual patching for realtime updates without needing to rerun the view each time.

OvermindDL1 avatar May 24 '23 17:05 OvermindDL1

@OvermindDL1 - I'm back on this - well the rendering from a reference portion. What type exactly are you referring to? Because SsrDom's html_string method takes &self.

schell avatar Oct 29 '23 22:10 schell