Tupl icon indicating copy to clipboard operation
Tupl copied to clipboard

Support cached columns

Open broneill opened this issue 1 year ago • 1 comments

Constructing strings from UTF-8 is expensive. Create an annotation which allows a column to be cached, either "soft" or "weak", where soft is the default. Document that caching is best suited for columns with low cardinality due to potential GC overhead.

The cache itself can be simple -- it has no max capacity and it doesn't perform any LRU reordering. A single global cache should work fine, and it needs to support high concurrency.

broneill avatar Jun 10 '23 16:06 broneill

In addition to referring to strings, the cache entries also need to refer to the UTF-8 encoded bytes. This is necessary for making quick comparisons, but it also means that the cache occupies much more memory than might be expected. All the more reason to document that the caching feature should only be used for columns with low cardinality.

broneill avatar Jun 14 '23 00:06 broneill