hnsw icon indicating copy to clipboard operation
hnsw copied to clipboard

memory overhead estimate?

Open josharian opened this issue 1 year ago • 5 comments

Hi! It'd be really useful if the readme included a rough estimate of the memory overhead, or other rough formula for expected steady state memory usage, in terms of the number of vectors and their shape. Thanks. :)

josharian avatar May 22 '24 16:05 josharian

Let me know if this helps.

ammario avatar May 22 '24 18:05 ammario

Yes, thank you!

josharian avatar May 22 '24 19:05 josharian

Actually, I have a follow-up. :)

One of the factors is listed as the size of the IDs. But IDs are strings, which means that there's also a 2 word (16 bytes on arm64/amd64) overhead per string, which will typically dwarf the actual string contents. Given that, I wonder whether it'd be better to have IDs be (u)int64s instead. It'd provide lower, constant overhead, and also make map operations faster.

josharian avatar May 23 '24 23:05 josharian

Good point on the additional words.

I went with strings to provide broad compatibility with existing IDs. I don't want most users maintaining some mapping between strings and uint64s to use the graph, and, I find large stringy IDs such as UUIDs are more popular these days than numerics.

Another idea is accepting a second generic parameters, K comparable. I don't like how it complicates graph instantiation and gives the user yet another parameter to think about, but perhaps the flexibility is worth it.

ammario avatar May 26 '24 19:05 ammario

I don't like how it complicates graph instantiation

Seconded.

But OTOH this is exactly the sort of scenario in which generics shine.

josharian avatar May 28 '24 00:05 josharian

How important is this level of performance to your use case?

ammario avatar May 28 '24 16:05 ammario

I’m skirting the edge of what I can do now, at 500gb of RAM. So for me, it’d be pretty useful; I could use uint32s and scale up. But I don’t know whether I am a typical user. :)

josharian avatar May 28 '24 16:05 josharian

Very neat that you're using it at that scale! I have yet to test with graphs that large. Let me know what you think about the API here.

ammario avatar May 31 '24 19:05 ammario

new api looks great. closing this again. :)

josharian avatar Jun 25 '24 01:06 josharian