qbeast-spark
qbeast-spark copied to clipboard
Add support for compacting files
What went wrong?
Recently, Delta contributors added the functionality to Optimize tables through SQL on the Open Source version. :raised_hands:
You can read everything in the issue related: https://github.com/delta-io/delta/commit/e366ccd6179c70dd603c2093a912aacfe719ed00 but summarize:
Having many small files can turn into a performance issue. To address this, Optimize operation compacts them into a single bigger file.
OPTIMIZE ('/path/to/dir' | delta.table) [WHERE part = 25];
This new feature would be part of next release 1.2.0
of Delta Lake. Right now, we only have compatibility with 1.0.0
, and Qbeast does not handle this type of compaction.
In order to be compatible with future Delta versions, we should:
- Upgrade to the newest version of Delta
- Solve compatibility problems that could arise (since it also changes Spark version to 3.2.0)
- Think about how to address compaction of small files at cube level
- Implement new functionality
- Add tests
The release of Delta 1.2.0 is already published! You can read all the notes here. :eyes: