LiberTEM icon indicating copy to clipboard operation
LiberTEM copied to clipboard

Resolve performance bottleneck for sum image with large frames and many workers

Open uellue opened this issue 6 years ago • 4 comments

On moellenstedt with 18 workers, the performance of a sum image from a K2IS data set with (1860,2048) detector resolution is limited by libertem-server, not the workers. This will be even more severe on a cluster. For EMPAD data with (128,128) frames, the issue doesn't occur on moellenstedt.

Discussions with @sk1p: Perhaps the pickling and unpickling of 15 MB intermediate sums from the workers is the culprit.

Items to work on:

  • [x] Verify if pickling is responsible by profiling.

Edit: Pickling is not the issue, see below.

uellue avatar Jan 08 '19 10:01 uellue

So, actually....

selection_070

Pickling is slow, but image encoding takes the crown. Even applying the colormap is slower than pickling. Just as a preliminary result, still need to verify by profiling.

For comparison, also included the pickle5 results: it is ~ an order of magnitude faster than the old pickle protocol.

Note that frame picking perf w/ large frames should be affected by the same bottleneck, and would benefit from optimizations here.

We should think about possible strategies for large frames or large scan areas, like:

  • tiling: makes the image encoding parallelizable
  • "binning pyramids": reduce the data on the server, not by resizing in the browser
  • zooming interface: allows to access the original, un-binned data for a region of interest by zooming in

sk1p avatar Jan 08 '19 13:01 sk1p

Just as a note, the recently released numpy 1.16 now supports the new pickle protocol; see https://github.com/numpy/numpy/releases/tag/v1.16.0

sk1p avatar Jan 14 '19 16:01 sk1p

The issue is still current, for stddev and sum on a K2 dataset the libertem-server process is between 100 % and 300 % CPU when used with the GUI.

uellue avatar Jul 28 '22 13:07 uellue

Discussion with @sk1p: Bumping to 0.11 since working on this in combination with #380 and aspects of #134 will significantly improve user experience.

uellue avatar Jul 28 '22 13:07 uellue