Memory accounting serialization_pipesort
It looks like serialization_pipesort is not properly managing its memory (e.g. using 16 when given 12GB), is this a known problem or should I prepare some log output etc.?
I have tried changing ut-serialization_sort to generating std::vectors with between 0 and 48 ints (uniformly distributed on that range), and I don't get any memory overusage reports.
What kind of items are you sorting and how are their sizes distributed?
@freekvw Please try out the serialization_sort branch. Serialization sorter is slower now unless you specify a "serialized item comparator" as in test/serialization_sort.cpp.
I have tried, and it has solved the issue on the sorting part (which was by far the biggest), at the expense of compute time (with the same memory limit, so of course I should give it a bit more). The "internal sorter" part, which I guess is independent of the other parts of the pipeline, went from 361 to 980 minutes, the number of runs in the first phase went from 48 to 100. What shall we do with this?
I guess we could trie to write a custom comparator to see if we can get the internal part up to speed, as for the number of phases, that is to be expected when we only use the memory allowed.