vroom
vroom copied to clipboard
Consider alternative index approaches
When dealing with very large numeric only files the index takes up ~ the same amount of memory as the actual data.
- We could investigate writing the index to disk and them mmaping that, which would dramatically reduce the memory requirements.
- We could keep the index in memory but compress it with something like lz4 and uncompress it as needed, provided we store some bookkeeping information on what row numbers the blocks correspond to.
- We could do both the above, write the index to disk and uncompress on demand.
+1 to that. Maybe it would even be possible to have a hard mem constraint to make sure the index does not blow the RAM (i.e. disk caching if needed)?