qbeast-spark icon indicating copy to clipboard operation
qbeast-spark copied to clipboard

Refactor Revision (flow) and Metadata Changes

Open osopardo1 opened this issue 2 years ago • 0 comments

Right now, Revision Changes are being treated in three different parts:

  1. SparkRevisionFactory -> creates a Revision with user configurations (columnsToIndex, columnStats...)
  2. OTreeAnalyzer -> analyses the data and triggers a new Revision if supersedes the existing one, or if the user input is does not contain enough information (such as columnStats that initialises the transformations).
  3. MetadataWriter -> commits the desired Revision Changes into the Table Metadata.

Each one of the components works independently and without visibility about how the Revision is triggered.

That's why a lot of bugs appear in certain conditions (such as appending to an empty table) that affect only one of the processes. We need to:

  • [ ] Analyze the current status.
  • [ ] Design a better flow of information. (One option could be to pass some information around through options (a Map of Strings)
  • [ ] Refactor the corresponding processes with new information treatment.

osopardo1 avatar Oct 26 '23 14:10 osopardo1