qbeast-spark Refactor Revision (flow) and Metadata Changes

Refactor Revision (flow) and Metadata Changes

Open osopardo1 opened this issue 2 years ago • 0 comments

Right now, Revision Changes are being treated in three different parts:

SparkRevisionFactory -> creates a Revision with user configurations (columnsToIndex, columnStats...)
OTreeAnalyzer -> analyses the data and triggers a new Revision if supersedes the existing one, or if the user input is does not contain enough information (such as columnStats that initialises the transformations).
MetadataWriter -> commits the desired Revision Changes into the Table Metadata.

Each one of the components works independently and without visibility about how the Revision is triggered.

That's why a lot of bugs appear in certain conditions (such as appending to an empty table) that affect only one of the processes. We need to:

[ ] Analyze the current status.
[ ] Design a better flow of information. (One option could be to pass some information around through options (a Map of Strings)
[ ] Refactor the corresponding processes with new information treatment.

Oct 26 '23 14:10 osopardo1