lazyflow icon indicating copy to clipboard operation
lazyflow copied to clipboard

OperatorWrapper causes slowness and high memory consumption

Open burgerdev opened this issue 9 years ago • 3 comments

In principle, the OperatorWrapper class was a neat concept, but I think we are at a point where it hinders us more than it helps. It is (at least partially) responsible for the problems in ilastik/ilastik#1075 and ilastik/ilastik#718, i.e. terrible usage of workflows with many time slices or many lanes.

The problem:

  • creation of many operators, of which only few are used simultaneously
    • memory problems
  • big trees of setupOutputs() and propagateDirty() calls
    • slowness (e.g. after adding new labels in PC)

The problem was 'solved' in some places by putting a custom for-loop in execute which loops over time slices (see opLabelVolume.OpLabelingABC for real world example):

class Op(Operator):
    def execute(self, slot, subindex, roi, result):
        for t in range(roi.start[0], roi.stop[0]):
            self.operate(roi, result[t:t+1, ...])

Can we do it a bit more general than this? It would also be nice if we did not have to rewrite all occurences of OperatorWrapper... What comes to my mind is either lazy allocation of inner operators, or a special operator type StatelessOperator which can be reused inside wrappers.

burgerdev avatar Mar 05 '15 17:03 burgerdev

I completely agree. I don't think we can eliminate OperatorWrappers entirely, but in cases where we are simply using them to (for example) process a 5D volume as a sequence of 4D volumes, it should be possible to implement some other abstraction, as you mention.

(We can't replace OperatorWrappers if the inner operators' slots don't have identical metadata, like shape, dtype, and axistags. So the top-level OperatorWrappers that define the lanes of an ilastik applet must remain as they are, I think.)

It will take some trial-and-error to come up with a satisfactory solution. Some questions that come to mind:

  • should the inner operations be parallelized within the execute() function? (I think so)
  • should we constrain the use-cases we support to "stateless" operators, as you mentioned? (sounds like a good idea)
  • should the "inner" operators created in this new mechanism be totally isolated from the outer graph? (not obvious to me)

stuarteberg avatar Mar 05 '15 18:03 stuarteberg

I checked the lazyflow operators again and I might have been wrong regarding state: most operators apparently do not keep an internal state. However, some operators (like lazy OpFilterLabels, lazy OpLabelVolume and the caches) keep state by design. This means that an approach like the following cannot work with those operators or any operator wrapping one of the stateful operators.

class OpWrap(Operator):
    def __init__(self, OpToWrap):
        self._op = OpToWrap(parent=self)
        self._select = SubregionSelector(OpToWrap)

    def setupOutputs(self):
        # ... connect internal operator to outer slots 1-to-1

        for name, slot in self.outputs.iteritems():
            slot.meta.assignFrom(self._op.outputs[name].meta)

        # ... disconnect internal operator and connect it to the subregionselector

    def execute(self, slot, subindex, roi, result):
        for t in range(roi.start[0], roi.stop[0]):
            # select a subregion and feed it to the wrapped operator, execute 

We would have to check -- for each occurence of OperatorWrapper -- whether it is safe to use there. While we were at that, we could also just replace the operators in question by 5d ones.

burgerdev avatar Mar 26 '15 14:03 burgerdev

For reference:

Operators currently wrapped in lazyflow and ilastik codebase:

Wrapped Operator Where File Need rewrite?
OpVigraLabelVolume OpLabelImage lazyflow/operators/opLabelImage.py no, is deprecated anyways
OpFeatureMatrixCache OpTrainVectorwiseClassifierBlocked lazyflow/operators/classifierOperators.py yes, cache is stateful
several __main__ lazyflow/tools/schematic.py no
OpDataSelection ? ilastik/applets/dataSelection/opDataSelection.py ?
? ? ilastik/applets/splitBodySupervoxelExport/opSplitBodySupervoxelExport.py ?
OpSlicedBlockedArrayCache OpObjectExtraction ilastik/applets/objectClassification/opObjectClassification.py yes, cache is stateful
OpMultiArrayStacker OpObjectExtraction ilastik/applets/objectClassification/opObjectClassification.py no
OpRegionFeatures3d OpObjectExtraction ilastik/applets/objectClassification/opObjectClassification.py yes, operator is not lazy and thus must have state
several ? ilastik/applets/autocontextClassification/opAutocontextClassification.py, ilastik/applets/autocontextClassification/opAutocontextBatch.py ?
? ? ilastik/applets/splitBodyPostProcessing/opSplitBodyPostProcessing.py ?
? ? ilastik/applets/fillMissingSlices/fillMissingSlicesApplet.py no, is off by default and nobody uses this
? several batch processing applets ? no, we need OperatorWrapper there

burgerdev avatar Mar 26 '15 15:03 burgerdev