LiberTEM
LiberTEM copied to clipboard
Resolve performance bottleneck for sum image with large frames and many workers
On moellenstedt with 18 workers, the performance of a sum image from a K2IS data set with (1860,2048) detector resolution is limited by libertem-server, not the workers. This will be even more severe on a cluster. For EMPAD data with (128,128) frames, the issue doesn't occur on moellenstedt.
Discussions with @sk1p: Perhaps the pickling and unpickling of 15 MB intermediate sums from the workers is the culprit.
Items to work on:
- [x] Verify if pickling is responsible by profiling.
Edit: Pickling is not the issue, see below.
So, actually....
Pickling is slow, but image encoding takes the crown. Even applying the colormap is slower than pickling. Just as a preliminary result, still need to verify by profiling.
For comparison, also included the pickle5
results: it is ~ an order of magnitude faster than the old pickle protocol.
Note that frame picking perf w/ large frames should be affected by the same bottleneck, and would benefit from optimizations here.
We should think about possible strategies for large frames or large scan areas, like:
- tiling: makes the image encoding parallelizable
- "binning pyramids": reduce the data on the server, not by resizing in the browser
- zooming interface: allows to access the original, un-binned data for a region of interest by zooming in
Just as a note, the recently released numpy 1.16 now supports the new pickle protocol; see https://github.com/numpy/numpy/releases/tag/v1.16.0
The issue is still current, for stddev and sum on a K2 dataset the libertem-server process is between 100 % and 300 % CPU when used with the GUI.
Discussion with @sk1p: Bumping to 0.11 since working on this in combination with #380 and aspects of #134 will significantly improve user experience.