delta icon indicating copy to clipboard operation
delta copied to clipboard

[BUG] Fix DeltaTableBuilder metadata configuration

Open scottsand-db opened this issue 2 years ago • 7 comments

Bug

Describe the problem

Currently, when DeltaTableBuilder is given a property that does not start with a delta. prefix, then that property is written to the table metadata in all lowercase.

Steps to reproduce

io.delta.tables.DeltaTable.create()
  .addColumn("bar", StringType)
  .property("dataSkippingNumIndexedCols", "33")
  .tableName("my_table")
  .execute()

Observed results

This will incorrectly write dataskippingnumindexedcols to the delta log metadata.

Expected results

We want to preserve the case of table properties. dataSkippingNumIndexedCols should be written to the metadata.

Implementation Requirement

In addition to fixing the above issue, we also need to be backwards compatible and be able to read older tables with these invalid properties.

As well, you need to include a flag that can be enabled to roll back to this previous behaviour.

After you fix this issue, please update the test and test output table here: https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/DeltaWriteConfigsSuite.scala#L316

Also, please add tests with the feature flag enabled/disabled, as well as by reading an "older" table with these invalid delta properties (this should go into EvolvabilitySuite).

scottsand-db avatar Jun 06 '22 23:06 scottsand-db

We only want properties that are prefixed with delta. to be written to the delta log, and we obviously want to preserve their case.

As this API is for setting table properties rather than options, we should allow arbitrary table properties. What we should fix is making it preserve the case.

zsxwing avatar Jun 21 '22 18:06 zsxwing

Do you think that preserving the case should be enabled by default and there should be an option to switch it off?

How about a flag that is mandatory if any non-delta property is set? One of the risk with the new behavior is that there are a lot of people that use notebooks and are not careful about the runtime they attach their notebook to. Or at least, we should document somehow this as a breaking behavior somehow..

edmondop avatar Jul 01 '22 20:07 edmondop

Here's some additional thoughts:

  • We will always enforce all delta. properties to be case sensitive.
  • We will add a new feature flag allowCaseInsensitiveProperties.enabled, which would be to allow case insensitivity for all non delta arbitrary properties.
  • We use a default underlying Map implementation and force case sensitive lookups for non delta. properties if this is set to false.
  • We use the existing implementation and allow case insensitive lookups for non delta. properties if this is set to true.

zpappa avatar Jul 02 '22 18:07 zpappa

@edmondo1984 @zpappa Here is a unit test to show the bug:

  test("delta table property") {
    for (setProp <- Seq("CREATE TABLE", "ALTER TABLE", "DeltaTableBuilder")) {
      withTempDir { dir =>
        val path = dir.getCanonicalPath()
        setProp match {
          case "CREATE TABLE" =>
            sql(s"CREATE TABLE delta.`$path`(id INT) " +
              s"USING delta TBLPROPERTIES('delta.appendOnly'='true', 'Foo'='Bar') ")
          case "ALTER TABLE" =>
            spark.range(1, 10).write.format("delta").save(path)
            sql(s"ALTER TABLE delta.`$path` " +
              s"SET TBLPROPERTIES('delta.appendOnly'='true', 'Foo'='Bar')")
          case "DeltaTableBuilder" =>
            DeltaTable.create()
              .addColumn("id", "int")
              .location(path)
              .property("delta.appendOnly", "true")
              .property("Foo", "Bar")
              .execute()
          case _ => assert(false, "unexpected")
        }
        val config = DeltaLog.forTable(spark, path).snapshot.metadata.configuration
        assert(
          config == Map("delta.appendOnly" -> "true", "Foo" -> "Bar"),
          s"$setProp's result is not correct: $config")
      }
    }
  }

DeltaTableBuilder is the only API that writes Map("delta.appendOnly" -> "true", -->"foo"<-- -> "Bar"). The rest two cases CREATE TABLE and ALTER TABLE write Map("delta.appendOnly" -> "true", "Foo" -> "Bar").

I would say this is a bug. However, since this is also a breaking change, we can add a flag such as deltaTableBuilder.forceTablePropertyLowerCase.enabled and turn it off by default.

zsxwing avatar Jul 05 '22 16:07 zsxwing

@edmondo1984 Are you interested in fixing this?

zsxwing avatar Jul 05 '22 16:07 zsxwing

Yes, can you assign the issue to me @zsxwing ?

edmondop avatar Jul 05 '22 17:07 edmondop

@edmondo1984 done. Thanks for offering the help!

zsxwing avatar Jul 05 '22 17:07 zsxwing