delta icon indicating copy to clipboard operation
delta copied to clipboard

[BUG] Fix SQL API option/properties metadata configuration

Open scottsand-db opened this issue 3 years ago • 6 comments

Wait for https://github.com/delta-io/delta/issues/1182 first. This PR needs to use the same feature flag that is created for that issue.

Bug

Describe the problem

There are various inconsistencies in the SQL API when writing metadata using OPTIONS and TBLPROPERTIES. Specifically

Case 1

when using TBLPROPERTIES, we incorrectly write properties that do not start with .delta. This happens using any of CREATE TABLE, REPLACE TABLE, CREATE OR REPLACE

Case 2

when using CREATE TABLE (not using AS SELECT) using path and using OPTIONS

  • we incorrectly write options that do not start with .delta
  • we incorrectly duplicate options with option.$key

Steps to reproduce

Case 1
CREATE TABLE tbl (id INT) USING DELTA
TBLPROPERTIES('logRetentionDuration'='interval 60 days', 'delta.checkpointInterval'=20)
Case 2
CREATE TABLE tbl (id INT) USING DELTA
OPTIONS('dataSkippingNumIndexedCols'=33,'delta.deletedFileRetentionDuration'='interval 2 weeks')
LOCATION '/private/var/folders/mv/gj5n7hvn78n7td5pp1c_jgjh0000gp/T/spark-69b43bf6-b143-459f-a05b-751bdbf4308a'

Observed results

Case 1

The following is written out as table metadata

logRetentionDuration -> interval 60 days
delta.checkpointInterval -> 20  

however what we really want to be written out is

delta.checkpointInterval -> 20  
Case 2

The following is written out as table metadata

delta.deletedFileRetentionDuration -> interval 2 weeks
dataSkippingNumIndexedCols -> 33
option.delta.deletedFileRetentionDuration -> interval 2 weeks
option.dataSkippingNumIndexedCols -> 33

however what we really want to be written out is

delta.deletedFileRetentionDuration -> interval 2 weeks

Implementation Requirement

After you fix this issue, please update the test and test output table here: https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/DeltaWriteConfigsSuite.scala#L316

Also, please add tests with the feature from #1182 flag enabled/disabled, as well as by reading an "older" table with these invalid delta properties (this should go into EvolvabilitySuite).

scottsand-db avatar Jun 06 '22 23:06 scottsand-db

Hi! I'm interested in implementing a fix for this bug. To be sure I understand the issue correctly, the result has to be a snapshot object that the metadata configuration only contains the parameters with delta prefix, the rest, nonprefixed and option.delta.{...} should be removed.

I've already started in a branch

alfonsorr avatar Aug 08 '22 22:08 alfonsorr

@alfonsorr - you can read the table metadata by doing deltaLog.snapshot.metadata.configuration.

So, we expect the result (i.e. the table metadata) after this fix to look as is described in the issue description above.

e.g. for case 1, currently deltaLog.snapshot.metadata.configuration is returning

logRetentionDuration -> interval 60 days
delta.checkpointInterval -> 20  

but instead we want it to only return

delta.checkpointInterval -> 20  

Let me know if you have any more questions!

BTW, a similar fix for a similar problem has already been done. See https://github.com/delta-io/delta/issues/1182 and https://github.com/delta-io/delta/pull/1254

scottsand-db avatar Aug 10 '22 03:08 scottsand-db

One thing, for parameters that are not delta related but are correct like in the test in the code: ALTER TABLE $tableName SET TBLPROPERTIES ('comment'='does it work?')

What should be the behavior, I understand that the comment has to be added to the metadata, but what about deltaLog.snapshot.metadata.configuration

alfonsorr avatar Aug 12 '22 09:08 alfonsorr

I just noticed that the description is not accurate. For example, today we will write

delta.deletedFileRetentionDuration -> interval 2 weeks
dataSkippingNumIndexedCols -> 33
option.delta.deletedFileRetentionDuration -> interval 2 weeks
option.dataSkippingNumIndexedCols -> 33

to deltaLog.snapshot.metadata.configuration.

And we want to write only the following paris

delta.deletedFileRetentionDuration -> interval 2 weeks
dataSkippingNumIndexedCols -> 33

zsxwing avatar Aug 26 '22 01:08 zsxwing

That makes sense! so it's only related to those that start with option.

alfonsorr avatar Aug 26 '22 06:08 alfonsorr

so it's only related to those that start with option.

Correct!

zsxwing avatar Aug 26 '22 06:08 zsxwing