nprintml
nprintml copied to clipboard
multithreaded optimization of label-aggregation
Label-aggregation performance was hugely improved by #54. However, it remains a single-threaded affair. It should be investigated whether multithreading/multiprocessing can further improve the performance of this step, and if so, implemented.
This seems like a tricky balance between shuffling data and actually aggregating, which could depend on the size of each individual nPrint. If we get it right, we should see improvement, but it's really hard to gauge just how much.