lazyflow
lazyflow copied to clipboard
OperatorWrapper causes slowness and high memory consumption
In principle, the OperatorWrapper
class was a neat concept, but I think we are at a point where it hinders us more than it helps. It is (at least partially) responsible for the problems in ilastik/ilastik#1075 and ilastik/ilastik#718, i.e. terrible usage of workflows with many time slices or many lanes.
The problem:
- creation of many operators, of which only few are used simultaneously
- memory problems
- big trees of
setupOutputs()
andpropagateDirty()
calls- slowness (e.g. after adding new labels in PC)
The problem was 'solved' in some places by putting a custom for-loop in execute which loops over time slices (see opLabelVolume.OpLabelingABC
for real world example):
class Op(Operator):
def execute(self, slot, subindex, roi, result):
for t in range(roi.start[0], roi.stop[0]):
self.operate(roi, result[t:t+1, ...])
Can we do it a bit more general than this? It would also be nice if we did not have to rewrite all occurences of OperatorWrapper
... What comes to my mind is either lazy allocation of inner operators, or a special operator type StatelessOperator
which can be reused inside wrappers.
I completely agree. I don't think we can eliminate OperatorWrappers entirely, but in cases where we are simply using them to (for example) process a 5D volume as a sequence of 4D volumes, it should be possible to implement some other abstraction, as you mention.
(We can't replace OperatorWrappers if the inner operators' slots don't have identical metadata, like shape, dtype, and axistags. So the top-level OperatorWrappers that define the lanes of an ilastik applet must remain as they are, I think.)
It will take some trial-and-error to come up with a satisfactory solution. Some questions that come to mind:
- should the inner operations be parallelized within the
execute()
function? (I think so) - should we constrain the use-cases we support to "stateless" operators, as you mentioned? (sounds like a good idea)
- should the "inner" operators created in this new mechanism be totally isolated from the outer graph? (not obvious to me)
I checked the lazyflow operators again and I might have been wrong regarding state: most operators apparently do not keep an internal state. However, some operators (like lazy OpFilterLabels
, lazy OpLabelVolume
and the caches) keep state by design. This means that an approach like the following cannot work with those operators or any operator wrapping one of the stateful operators.
class OpWrap(Operator):
def __init__(self, OpToWrap):
self._op = OpToWrap(parent=self)
self._select = SubregionSelector(OpToWrap)
def setupOutputs(self):
# ... connect internal operator to outer slots 1-to-1
for name, slot in self.outputs.iteritems():
slot.meta.assignFrom(self._op.outputs[name].meta)
# ... disconnect internal operator and connect it to the subregionselector
def execute(self, slot, subindex, roi, result):
for t in range(roi.start[0], roi.stop[0]):
# select a subregion and feed it to the wrapped operator, execute
We would have to check -- for each occurence of OperatorWrapper
-- whether it is safe to use there. While we were at that, we could also just replace the operators in question by 5d ones.
For reference:
Operators currently wrapped in lazyflow and ilastik codebase:
Wrapped Operator | Where | File | Need rewrite? |
---|---|---|---|
OpVigraLabelVolume |
OpLabelImage |
lazyflow/operators/opLabelImage.py |
no, is deprecated anyways |
OpFeatureMatrixCache |
OpTrainVectorwiseClassifierBlocked | lazyflow/operators/classifierOperators.py |
yes, cache is stateful |
several | __main__ |
lazyflow/tools/schematic.py |
no |
OpDataSelection |
? | ilastik/applets/dataSelection/opDataSelection.py |
? |
? | ? | ilastik/applets/splitBodySupervoxelExport/opSplitBodySupervoxelExport.py |
? |
OpSlicedBlockedArrayCache |
OpObjectExtraction |
ilastik/applets/objectClassification/opObjectClassification.py |
yes, cache is stateful |
OpMultiArrayStacker |
OpObjectExtraction |
ilastik/applets/objectClassification/opObjectClassification.py |
no |
OpRegionFeatures3d |
OpObjectExtraction |
ilastik/applets/objectClassification/opObjectClassification.py |
yes, operator is not lazy and thus must have state |
several | ? | ilastik/applets/autocontextClassification/opAutocontextClassification.py , ilastik/applets/autocontextClassification/opAutocontextBatch.py |
? |
? | ? | ilastik/applets/splitBodyPostProcessing/opSplitBodyPostProcessing.py |
? |
? | ? | ilastik/applets/fillMissingSlices/fillMissingSlicesApplet.py |
no, is off by default and nobody uses this |
? | several batch processing applets | ? | no, we need OperatorWrapper there |