DataCleaner
DataCleaner copied to clipboard
Introduce unique ID for all component instances
This is supposed to be a major change that influences multiple issues. For example #1440. Currently the component instances are identified only by their names which are not unique in case of multiple instances of the same component (without renaming by user).
Let's provide a unique ID to each component instance and use this ID instead of the name when appropriate.
I'd go further and do the same for everything referencable, so also datastores and columns (both from tables and outputs). That would both make sure we never need to play the "which one of them is this" game for coalesce issues, as make them reusable by the writers.
I'd suggest using something like UUIDs for the identifiers so that we don't need to worry about scopes.
Regarding issues, I don't think we have an actual issue open on it right now, but I'll bet that I can make a job fail by playing around a bit with components making use of coalesce units. They are very fragile, since they currently depend on names of columns and compenents. Making them even more fragile is the fact that for datastore columns, those names depends not on the datastore name, but on the schema and table name. So if you for example move and/or rename a CSV, the job will fail even if the DC datastore is updated to reflect it.