lance
lance copied to clipboard
Improve list scheduling order in v2 files in a future version of v2
A single page of list offsets might (will usually) map to many pages of list items. This is especially true if the list items are large. Currently, we schedule all of those list item pages at the same priority. This is because it would be pretty costly to figure out the correct priority (require various binary searches into the offsets array during scheduling). As a result, we end up needing to buffer quite a bit of RAM to load a list column.
In the future we should record the top-level row for each page as we are writing data. This would make scheduling much simpler.