website icon indicating copy to clipboard operation
website copied to clipboard

Document Delta Table optimization in a single entrypoint

Open edmondop opened this issue 3 months ago • 1 comments

With the improvements of Delta Table and the previous existing optimizations, it becomes a little bit harder to wrap our head around it.

  • Data skipping via statistics
  • Data skipping improved via Z Index
  • Bloom Filters
  • Liquid Clustering
  • Merge on Read

Other random ideas add here ... @MrPowers

edmondop avatar Apr 19 '24 13:04 edmondop

Thanks for raising this @edmondop.

Here are a few other performance enhancements:

  • relying on metadata only for certain queries
  • Deletion vectors (you kind of already mentioned this one with merge on read)
  • Avoiding expensive file listing operations
  • eliminating small files via compaction
  • calling out that there is file skipping & then predicate pushdown filtering

We could possibly add all these to the Delta Lake Performance blog.

MrPowers avatar Apr 19 '24 17:04 MrPowers