qbeast-spark icon indicating copy to clipboard operation
qbeast-spark copied to clipboard

Add support for compacting files

Open osopardo1 opened this issue 2 years ago • 1 comments

What went wrong?

Recently, Delta contributors added the functionality to Optimize tables through SQL on the Open Source version. :raised_hands:

You can read everything in the issue related: https://github.com/delta-io/delta/commit/e366ccd6179c70dd603c2093a912aacfe719ed00 but summarize:

Having many small files can turn into a performance issue. To address this, Optimize operation compacts them into a single bigger file.

OPTIMIZE ('/path/to/dir' | delta.table) [WHERE part = 25];

This new feature would be part of next release 1.2.0 of Delta Lake. Right now, we only have compatibility with 1.0.0, and Qbeast does not handle this type of compaction.

In order to be compatible with future Delta versions, we should:

  1. Upgrade to the newest version of Delta
  2. Solve compatibility problems that could arise (since it also changes Spark version to 3.2.0)
  3. Think about how to address compaction of small files at cube level
  4. Implement new functionality
  5. Add tests

osopardo1 avatar Apr 05 '22 08:04 osopardo1

The release of Delta 1.2.0 is already published! You can read all the notes here. :eyes:

osopardo1 avatar Apr 25 '22 08:04 osopardo1