kuzu
kuzu copied to clipboard
Optimize Catalog checkpoint to not rewrite the whole content each time
The problem
~~1. for any writes to the catalog, we need to maintain both read and write version of the whole catalog, basically duplicate the memory overhead unnecessarily.~~
2. checkpoint of the catalog file triggers rewritten of the whole file, which is also unnecessary in almost all cases.
~~3. the two version design also exists in TablesStatistics. while they basically duplicate the same logic without sharing the same architecture.~~
4. there is lack of built-in dependency management in our current catalog. RelGroup
is also modelled as a Table, which is not the correct level of abstraction, as it should be the parent of a bunch of rel Tables. same for rdf graph.
Edit note: problem 1 and 2 are no longer true after the mvcc rework.