delta
delta copied to clipboard
[BUG] Fix DeltaTableBuilder metadata configuration
Bug
Describe the problem
Currently, when DeltaTableBuilder
is given a property that does not start with a delta.
prefix, then that property is written to the table metadata in all lowercase.
Steps to reproduce
io.delta.tables.DeltaTable.create()
.addColumn("bar", StringType)
.property("dataSkippingNumIndexedCols", "33")
.tableName("my_table")
.execute()
Observed results
This will incorrectly write dataskippingnumindexedcols
to the delta log metadata.
Expected results
We want to preserve the case of table properties. dataSkippingNumIndexedCols
should be written to the metadata.
Implementation Requirement
In addition to fixing the above issue, we also need to be backwards compatible and be able to read older tables with these invalid properties.
As well, you need to include a flag that can be enabled to roll back to this previous behaviour.
After you fix this issue, please update the test and test output table here: https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/DeltaWriteConfigsSuite.scala#L316
Also, please add tests with the feature flag enabled/disabled, as well as by reading an "older" table with these invalid delta properties (this should go into EvolvabilitySuite).
We only want properties that are prefixed with
delta.
to be written to the delta log, and we obviously want to preserve their case.
As this API is for setting table properties rather than options, we should allow arbitrary table properties. What we should fix is making it preserve the case.
Do you think that preserving the case should be enabled by default and there should be an option to switch it off?
How about a flag that is mandatory if any non-delta property is set? One of the risk with the new behavior is that there are a lot of people that use notebooks and are not careful about the runtime they attach their notebook to. Or at least, we should document somehow this as a breaking behavior somehow..
Here's some additional thoughts:
- We will always enforce all delta. properties to be case sensitive.
- We will add a new feature flag
allowCaseInsensitiveProperties.enabled
, which would be to allow case insensitivity for all non delta arbitrary properties. - We use a default underlying Map implementation and force case sensitive lookups for non delta. properties if this is set to false.
- We use the existing implementation and allow case insensitive lookups for non delta. properties if this is set to true.
@edmondo1984 @zpappa Here is a unit test to show the bug:
test("delta table property") {
for (setProp <- Seq("CREATE TABLE", "ALTER TABLE", "DeltaTableBuilder")) {
withTempDir { dir =>
val path = dir.getCanonicalPath()
setProp match {
case "CREATE TABLE" =>
sql(s"CREATE TABLE delta.`$path`(id INT) " +
s"USING delta TBLPROPERTIES('delta.appendOnly'='true', 'Foo'='Bar') ")
case "ALTER TABLE" =>
spark.range(1, 10).write.format("delta").save(path)
sql(s"ALTER TABLE delta.`$path` " +
s"SET TBLPROPERTIES('delta.appendOnly'='true', 'Foo'='Bar')")
case "DeltaTableBuilder" =>
DeltaTable.create()
.addColumn("id", "int")
.location(path)
.property("delta.appendOnly", "true")
.property("Foo", "Bar")
.execute()
case _ => assert(false, "unexpected")
}
val config = DeltaLog.forTable(spark, path).snapshot.metadata.configuration
assert(
config == Map("delta.appendOnly" -> "true", "Foo" -> "Bar"),
s"$setProp's result is not correct: $config")
}
}
}
DeltaTableBuilder
is the only API that writes Map("delta.appendOnly" -> "true", -->"foo"<-- -> "Bar")
. The rest two cases CREATE TABLE
and ALTER TABLE
write Map("delta.appendOnly" -> "true", "Foo" -> "Bar")
.
I would say this is a bug. However, since this is also a breaking change, we can add a flag such as deltaTableBuilder.forceTablePropertyLowerCase.enabled
and turn it off by default.
@edmondo1984 Are you interested in fixing this?
Yes, can you assign the issue to me @zsxwing ?
@edmondo1984 done. Thanks for offering the help!