iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Spark 3.2: Support reading position deletes

Open szehon-ho opened this issue 2 years ago • 12 comments

This exposes position deletes as a metadata table "position_deletes", with schema: file, pos, row, partition

This will be useful when trying to implement "RewritePositionDeleteFiles", as we will read positional deletes from Spark and then write it. It will also be useful to implement "RemoveDanglingDeleteFiles", ie removing delete files that no longer reference live data files. It will also be generally useful to get more insights into positional delete files as a table via SQL.

Notes:

  1. Design choice: Why via metadata table? Initially I tried to implement it as a Spark Read config, but SparkCatalog.loadTable didnt support read configuration options to load SparkTable with an alternate schema. So hence chose metadata table path.
  2. Implementation: Most of the changes here are adding the concept of DeleteFileScanTask. The FileScanTask is today bound to DataFiles (FileScanTask.file() returns DataFile) and we can't really change that as it's used in hundreds of places, so added a contentFile() method that returns the DataFile for DataFileScanTask or DeleteFile in case of DeleteFileScanTask. To support position delete file scan, change code calls that need to scan delete files from FileScanTask.file() => FileScanTask.contentFile(), as all logics should work equally with both DeleteFile and DataFile.

szehon-ho avatar May 18 '22 21:05 szehon-ho

FYI @aokolnychyi @RussellSpitzer @rdblue, if you also have time to take a look

szehon-ho avatar May 19 '22 16:05 szehon-ho

Let me take a look.

aokolnychyi avatar Jun 01 '22 23:06 aokolnychyi

I feel it is reasonable to have a metadata table like this. I am not sure about the change in FileScanTask, though. I also have a few questions related to FileScanTask in #4870.

aokolnychyi avatar Jun 01 '22 23:06 aokolnychyi

+1 for using a metadata table for this. It can then be used in numerous places and for various reasons.

kbendick avatar Jun 02 '22 01:06 kbendick

While I think a metadata table like this would be helpful, I am not sure it is something we can leverage during compaction. We will need to have access to a list of DeleteFile we compacted at the commit time so that we can replace them with the newly added files. That's why the data compaction action uses the staging approach.

aokolnychyi avatar Jun 03 '22 18:06 aokolnychyi

Regarding the metadata table, why is it only limited to position deletes? do we anticipate a separate equality deletes metadata table too? This metadata table is at row level, like a source. For row-level deletes metadata, I am wondering if delete_files summary is good enough, which is similar to the files metadata table.

I assume this position_deletes metadata table is to not related to the CDC read like Ryan suggested in the CDC. Otherwise, the metadata table should cover inserts, position deletes, and equality deletes.

stevenzwu avatar Jun 05 '22 21:06 stevenzwu

@stevenzwu yea I think equality deletes will be different tables because the schema is not the same. Yea as delete_files is a summary, I thought this would be useful as we are lacking any way for user to see contents of delete files easily. I guess it will not be the exact same as CDC table.

@aokolnychyi yea the idea wrt RewritePositionDeleteFiles was to use the building blocks in conjunction with file-scan-task-set-id, for the compaction case. This pr should have most of the pieces though, and hopefully we just need to add some wiring.

szehon-ho avatar Jun 06 '22 18:06 szehon-ho

I thought this would be useful as we are lacking any way for user to see contents of delete files easily.

Thanks for confirming.

stevenzwu avatar Jun 06 '22 20:06 stevenzwu

Addressed the review comments. As mentioned in second comment , adding some filter pushdown may require a bit of discussion, so i may give a try here (at least for partition column) or in a second pr. But in any case, this should be ready as is for a first review.

szehon-ho avatar Jun 09 '22 16:06 szehon-ho

Added partition column and got filter push down working, with all the partition spec evolution scenarios. Added a lot of tests in dedicated file (TestPositionDeletesTable).

Will look to see how to incorporate #5077 when it is in.

szehon-ho avatar Jun 23 '22 22:06 szehon-ho

Rebase and support spark 3.3, resolved most comments.

szehon-ho avatar Jul 13 '22 06:07 szehon-ho

Some test was cancelled, re-triggering

szehon-ho avatar Aug 09 '22 02:08 szehon-ho

Closing as it's now broken into smaller prs

szehon-ho avatar Feb 01 '23 17:02 szehon-ho