flatbush support adding fewer than numItems via auto-trim()

Fixes #43

hey @mourner,

this diff was the path of least resistance. i think potentially this._levelBounds init can be deferred until finish(), but is likely minor, though every little bit counts when you're rebuilding the index during a canvas resize/drag :)

i'm not sure there is much reason to require trim() be manually invoked as it would just add noise to user code simply to avoid an automatically-avoidable error? you may also be passing the Flatbush instance outside the original context, and then consumers need to know about trim() and also how to read back expected length to do the pre-check before invoking .finish()...feels awkward.

Aug 30 '22 14:08 leeoniya

ping @mourner :smile:

Sep 07 '22 05:09 leeoniya

re-ping @mourner :jack_o_lantern:

Oct 31 '22 21:10 leeoniya

@leeoniya sorry, having a lot on my plate here in Kyiv — not sure when I'll get to this. In the meantime, you could simply rewrite the trim so that it could be added on your app side rather than merged in.

Nov 05 '22 16:11 mourner

sorry! no problem, stay safe! :ukraine:

Nov 09 '22 21:11 leeoniya

i'm not sure there is much reason to require trim() be manually invoked as it would just add noise to user code simply to avoid an automatically-avoidable error?

@leeoniya Getting back to the PR and thinking about this more, I guess the reason why it doesn't feel right to trim automatically is because it makes it easy to shoot yourself in the foot. If we change the docs for the numItems to really mean maxNumItems, people will start using Flatbush this way, not realizing that there's a significant difference in performance & memory footprint between e.g. adding numItems and adding numItems - 1 and trimming.

In the latter case, the code will duplicate the allocated memory and then dump the old one all under the hood. This is why I like things being explicit, so that users realize there's a cost to calling trim(). The behavior is less magic and more transparent, just the way I like it.

Apr 21 '23 10:04 mourner

not realizing that there's a significant difference in performance & memory footprint

i ran a non-scientific test of this and the cost of trimming 100k estimated down to 50k-70k actual is ~0.5ms. the rest of that .finish() call takes 16ms when using new Flatbush(100_000, 512, Int16Array);. so the CPU overhead is 3%, and likely proportionally lower when the item counts are higher. in my case i'm way over a single frame budget anyways and i don't foresee people re-indexing 100k+ objects every 16ms while expecting smooth sailing.

keeping it explicit is fine. in my order of preference:

add trim: boolean arg to the Flatbush constructor, but it already has a lot of opts :(
index.finish(trim: false)
index.trim() + index.finish()

only the first option allows you to be explicit and not have to worry about calling finish() with insufficient items. the other two options have worse ergonomics because calling .finish() without remembering to indicate trim runs the risk of throwing an error that is completely avoidable.

Apr 23 '23 04:04 leeoniya

@leeoniya the profiler won't show the full picture when it's about memory allocation. When a bunch of memory is no longer used, it will be marked for garbage collection, but not necessarily be freed at the time — it will batch those unused memory chunks until a GC pass, which may happen much later and cause a micro-stutter. So it may have an impact even if you don't see it in the profiler. In your case 0.5ms is likely just for the 50k allocation / memory copy, not for the GC (50k is a pretty small amount anyway).

Nowadays developers aren't used to paying attention to memory allocation, but it's something that feels important to me. Even if it this doesn't make a difference in your particular application, it still doesn't feel right to do the trimming implicitly.

calling .finish() without remembering to indicate trim runs the risk of throwing an error that is completely avoidable.

I don't see any problems with seeing an error if it's easy to fix and make the code more intentional.

Apr 24 '23 16:04 mourner

Nowadays developers aren't used to paying attention to memory allocation, but it's something that feels important to me.

i spend a lot of time in mem and cpu profilers and have written some pretty fast libs, so i definitely do pay attention. in this case, however, .finish() will always dwarf any .trim() + follow-up GC costs if you have to re-index in a tight loop. GC of shallow / simple structs, like a contiguous typed array will be very small.

i'm on a phone now, but happy to do a more thorough cpu & mem profile with some setInterval / 50ms and 10x larger datasets like 500k.

in either case, i'll update the pr to not do auto-trim during finish() without explicit opt-in.

Apr 24 '23 19:04 leeoniya

i definitely do pay attention

Sorry, I was referring to library users. Maybe you're right and I'm being irrational, but it still doesn't sit well with me that an index can suddenly allocate twice as much memory as expected based on non-obvious logic. Maybe that's some exaggerated sense of "purity" and subjective taste that makes me defensive about one of my favorite libraries, but I'd rather go with a gut feeling than endlessly argue.

Apr 24 '23 20:04 mourner

pushed some updates. WDYT?

Apr 25 '23 15:04 leeoniya

flatbush flatbush copied to clipboard

support adding fewer than numItems via auto-trim()

flatbush
flatbush copied to clipboard