tpie
tpie copied to clipboard
Pipelining merge sorter does not recompute parameters after data structures
When we introduced data structures to pipelining, we started calling set_available_memory twice on each node: a) Once before freezing data structures, and b) once after freezing data structures.
However, the pipelining sorter was not adapted to this, so it currently uses the memory assigned in a) to compute sort parameters and completely ignores b).
Fortunately, memory assigned to nodes in b) is greater or equal to that in a), so we don't risk memory overusage in the sorter, but we should recompute sort parameters when memory is assigned in b).
Probably we can compute all merge sort parameters in begin() of the first phase.
There is no easy way to fix this.
The problem is that when begin is called on the input node (the first of the 3 nodes in the pipeline), it has to call begin on the merge sorter. Sometime after this push is called on the input node which forwards it to the merge sorter. However after the push the calc node (the 2nd of the 3 nodes) is notified about how many resources it can use after the data structures has been frozen and so it can't change the parameters of the merge sorter, because it has started.
Two possible fixes:
- Change how pipelining works
- Allow calculating the parameters for the 3 phases in the merge sorter independently and only just before that phase starts. (Note: phase 1 memory usage is dependent on the fanout which is dependent on phase 2 memory and files usage, so this is probably not possible to adjust phase 1 memory usage)