featurebase
featurebase copied to clipboard
add documentation about memory usage
When Pilosa has just freshly imported data, and is not serving any queries. It is possible to get a pretty accurate upper bound on its memory usage with a simple calculation.
The actual roaring bitmap data is mmapped and off-heap, so heap usage is dominated entirely by containers. See snippet of inuse_space profile below:
File: pilosa
Type: inuse_space
Showing nodes accounting for 23.73GB, 99.41% of 23.87GB total
Dropped 71 nodes (cum <= 0.12GB)
flat flat% sum% cum cum%
19.16GB 80.25% 80.25% 19.16GB 80.25% github.com/pilosa/pilosa/roaring.NewContainer (inline)
4.58GB 19.16% 99.41% 4.58GB 19.16% github.com/pilosa/pilosa/enterprise/b.glob..func1
Multiply
-
of shards
- number of time views
- total # of rows in all fields
- 16 containers per row
- 80 bytes per container to get the total number of bytes. That will be about 80% of total memory use - (the actual Container structs). The rest is tracking of which container has which key.
Need to also talk about how queries and row cache affect memory, mmapped data should get evicted, but still affects total memory usage.
We can also see that cutting the size of the Container struct would be pretty worthwhile.
set field caches and key translation also need to be accounted for