Reduce allocations during encoding

Open Spaceface16518 opened this issue 5 years ago • 0 comments

I didn't know whether to expect #10 to be accepted, but since it was, here are some similar optimizations (with more detailed explanations).

Summary

Using the iterator extend pattern can drastically reduce the number of allocations (as well as total memory allocated!) when generating the hash for image (in the encode function).

Status Quo

Previously, encode_base83_string allocated a new String every time it was called. This incurred a heap allocation at least 3 times, despite the string only being between 1 and 4 characters long. Additionally, another heap allocation would be incurred for every encoded ac component.

Optimizations

Preallocating the hash string so that it never has to be resized. Visually counting up the number of characters to be placed in the string yields this equation for the hash string size: 1 + 1 + 4 + 2(factors). Specifically, 1 for the size flag, 1 for quantized maximum value (or a zero), 4 for the dc component, and 2 times the number of factors (i.e. 2 * ac.len()).
Extending the string with a char iterator rather than appending a String. Changing the return of the internal function encode_base83_string to the opaque type impl Iterator<Item=char> means that the characters of the hash string can be lazily generated without intermediate heap allocations for each String. The iterator is then consumed by the String::extend method, which takes into account the size hint of the iterator; since the size hint will be exact (check RangeFull::size_hint and Map::size_hint), this means that the method will always know the exact size to reserve during the extension. However, since we already preallocated, we will never have the reserve anything. This means that extend will simply be a safety size check (simply an integer comparison) and then a bunch of push calls (usually just 1 or 2 anyway).

Implications and Benefits

All in all, we reduce the number of allocations from O(n) to O(1) (where n is the number of ac components). This is beneficial because the WASM memory model punishes frequent, small allocations. Additionally, using less memory when using wee_alloc is beneficial because wee_alloc's allocation is an O(n) operation. Moreover, wee_alloc has a two-word overhead for each allocation, meaning we have a constant four words of overhead rather than an unbounded amount.

Jan 22 '21 04:01 Spaceface16518