qbeast-spark
qbeast-spark copied to clipboard
Refactor Revision (flow) and Metadata Changes
Right now, Revision Changes are being treated in three different parts:
SparkRevisionFactory-> creates a Revision with user configurations (columnsToIndex,columnStats...)OTreeAnalyzer-> analyses the data and triggers a new Revision if supersedes the existing one, or if the user input is does not contain enough information (such ascolumnStatsthat initialises thetransformations).MetadataWriter-> commits the desired Revision Changes into the Table Metadata.
Each one of the components works independently and without visibility about how the Revision is triggered.
That's why a lot of bugs appear in certain conditions (such as appending to an empty table) that affect only one of the processes. We need to:
- [ ] Analyze the current status.
- [ ] Design a better flow of information. (One option could be to pass some information around through
options(a Map of Strings) - [ ] Refactor the corresponding processes with new information treatment.