qbeast-spark icon indicating copy to clipboard operation
qbeast-spark copied to clipboard

Support schema evolution

Open osopardo1 opened this issue 4 years ago • 0 comments

Schema evolution is a feature of Delta Lake that allows users to easily change a table’s current schema to accommodate data that is changing over time. Most commonly, it’s used when performing an append or overwrite operation, to automatically adapt the schema to include one or more new columns.

Currently we don't support this type of change in Qbeast format, or at least we don't let the user specify any new columns to index with Qbeast. We should investigate on this topic, mostly on what could be the side effect. The proposal is to use SpaceRevision or Revision to actually save the new schema information and treat new revision as new indexes to query.

More information about the delta feature can be found in: https://databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html

In the meantime, the Qbeast commit log structure should support the upcoming development of this feature. Meaning that some information like number of dimensions or columns indexed should be saved in the commit log.

osopardo1 avatar Aug 24 '21 07:08 osopardo1