DataCleaner
DataCleaner copied to clipboard
Change of record-cardinality caused by INNER JOIN semantics of Table Lookup affects components that aren't even wired to it
Scenario:
You create a job with a graph like this:
/-> table lookup -> analyzer1
src --
\-> analyzer2
and the table lookup is configured to use INNER JOIN semantics.
On actual lookups it sometimes finds 0 matches and sometimes more than 1. This changes obviously the cardinality of records for analyzer1
.
But the trouble is that it also seems to change the cardinality for analyzer2
.
This by all chance has it's roots in our row processing framework which is optimized by passing along the records with the threads that process it (single-record-flow). We should make sure to apply special component requirements to analyzer1 or analyzer2 in this case, to ensure that the record flow is filtered in analyzer2 and included in analyzer1.
Should we maybe just use a new output datastream instead? That would pretty much sidestep the issue.