LiberTEM
LiberTEM copied to clipboard
`ValidationUDF`: Validate that all data of a dataset is seen exactly once
Currently, the ValidationUDF
only checks if the data it receives matches the reference. In the future it could also keep track which parts of the data it has seen to make sure none are skipped or processed twice.
Sadly not as easy as I first thought: adding a check in get_results
doesn't work in a straight-forward way, as we don't know if we have the final result in our hands, or if it is only a partial result. We can't easily check a "should eventually be equal to" constraint in that way... so something like this fails:
diff --git a/tests/utils.py b/tests/utils.py
index 9dd04dd2..848b141f 100644
--- a/tests/utils.py
+++ b/tests/utils.py
@@ -167,15 +167,23 @@ class ValidationUDF(UDF):
def get_result_buffers(self):
return {
- # Just a buffer to "feel" the av shape
- 'nav_shape': self.buffer(kind="nav", dtype="float32"),
+ 'seen': self.buffer(kind="nav", dtype=np.int64),
}
def process_tile(self, tile):
+ self.results.seen[:] += 1
assert self.params.validation_function(
self.meta.slice.get(self.params.reference), tile
)
+ def get_results(self):
+ if self.meta.roi is None:
+ expected = self.meta.dataset_shape.size
+ else:
+ expected = np.sum(self.meta.roi)
+ assert np.sum(self.results.seen) == expected
+ return {}
+
Might need checks at the call site of the UDF, or maybe a small addition to the UDF API?