velox
velox copied to clipboard
Add recursive spill for RowNumber
At present, RowNumber only supports single-level spill partitions. Once a spill takes place, it won't be spilled again. This means that even if a restored spill partition uses an excessive amount of memory, it still cannot be reclaimed. Therefore, it would be beneficial to implement recursive spill support, as outlined in https://facebookincubator.github.io/velox/develop/spilling.html.
1 Advance 'spillPartitionBits_' based on the partition bit of the currently restoring spill partition to prepare for potential subsequent spills (recursive).
2 During the recursive spill input process, read the spilled input data from the previous spill run via 'spillInputReader_', then spill it back into several sub-partitions. Then restore one of the newly spilled partitions and reset 'spillInputReader' accordingly.
3 It's crucial to separate the recursive input spill process from the spill process itself, as it can be time-consuming and should yield if it exceeds the CPU time limit.
In summary, the first spill operation writes all data to the disk. Subsequently, RowNumber reads the data from the disk one partition at a time and clears the data from the restored partition once it has been processed (#8413). In the case of a recursive spill, only the data from the currently restored partition in memory is spilled to the disk (next level), and only one partition from the next level is loaded back into memory.
The data layout may look like as follows.
| Memory | Disk | |
|---|---|---|
| No-Spill | [P0, P1] | [] |
| First Spill | [] | [P0, P1] |
| Restore | [P0] | [P1] |
| Second Spill | [] | [P0-0, P0-1, P1] |
| Restore | [P0-0] | [P0-1, P1] |
| Processed | [] | [P0-1, P1] |
| Restored | [P0-1] | [P1] |
| Processed | [] | [P1] |
| Restored | [P1] | [] |
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | 4e09fffeefa9b327bbe1aacf22298fecc991debd |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/6616be4955a08000085fb9d9 |
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@xiaoxmeng Hi Xuan, I've made a small refactor based on your newly simplified RowNumber spill implementation. Could you please help to take another look? Thanks.
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@bikramSingh91 Hi Bikram, any further comments? Thanks.
@mbasmanova Hi Masha, all the prerequisite PRs have been merged, could you please help take another look? Thanks.
@duanmeng Thank you for adding "The data layout may look like as follows." section to PR description. It is very helpful.
Spilling is rather complicated and we are seeing many bugs in it. I wonder how much testing have been done for this change and what's the confidence of it being bug free. I don't believe we have fuzzer coverage for RowNumber operator. Would that be necessary to ensure there are no (or very few) bugs?
Also, I wonder what's the motivation for this change. Is there a real-world scenario that benefits from this change and justifies added complexity and maintenance costs. Would you describe that scenario? In this scenario, what's the impact on latency and CPU usage of spilling? How much CPU and wall time a query would use without spilling and how does that compare to spilling?