delta
delta copied to clipboard
[Spark] Automatically add legacy features to table when setting legacy protocol versions
Which Delta project/connector is this regarding?
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
Description
Currently, dropping the last feature from a table results in downgrading the table's protocol to the minimum required legacy versions. However, this can only happen if the legacy features of the table match exactly a legacy protocol version pair. This rule is placed to avoid accidentally upgrading a table by implicitly adding extra legacy features during downgrade. This might disallow version downgrade in tables created directly with table features.
For example, consider creating a table with Protocol(3, 7, RemovableReaderWriterFeature, ChangeDataFeed)
and then dropping RemovableReaderWriterFeature. The resulting protocol will be Protocol(3, 7, ChangeDataFeed)
instead of Protocol(1, 4)
. This is because (1, 4) includes legacy features not defined in the original protocol. If the user wishes to downgrade this table to legacy versions needs to set the legacy versions before downgrading the feature.
For example:
CREATE TABLE x TBLPROPERTIES (
delta.feature.RemovableReaderWriterFeature = 'supported',
delta.feature.ChangeDataFeedTableFeature = 'supported')
ALTER TABLE x SET TBLPROPERTIES (
'delta.minReaderVersion' = 1,
'delta.minWriterVersion' = 4)
ALTER TABLE x DROP FEATURE RemovableReaderWriterFeature
Now we also allow setting legacy protocol versions to a features table with only legacy features. That can help in scenarios where the user dropped the last table feature but did not set the legacy protocol versions in advance. For example:
CREATE TABLE x TBLPROPERTIES (
delta.feature.RemovableReaderWriterFeature = 'supported',
delta.feature.ChangeDataFeedTableFeature = 'supported')
// After this command the table will have Protocol(3, 7, ChangeDataFeedTableFeature).
ALTER TABLE x DROP FEATURE RemovableReaderWriterFeature
// After this command the table will have Protocol(1, 4).
ALTER TABLE x SET TBLPROPERTIES (
'delta.minReaderVersion' = 1,
'delta.minWriterVersion' = 4)
Furthermore, we also relax the requirements so the user does not have to set the right versions. When setting any legacy protocol versions to a table features protocol, we automatically add any missing legacy features given the legacy features that already present in the protocol.
How was this patch tested?
Added new tests to DeltaProtocolVersionSuite
.
Does this PR introduce any user-facing changes?
Yes. The user can now set legacy versions to a table with features to automatically add all relevant legacy features. Example provided in the description of the PR.
One question. Consider the following command sequence:
CREATE TABLE x TBLPROPERTIES (
delta.feature.RemovableReaderWriterFeature = 'supported',
delta.feature.ChangeDataFeedTableFeature = 'supported')
ALTER TABLE x DROP FEATURE RemovableReaderWriterFeature
ALTER TABLE x SET TBLPROPERTIES (
'delta.minReaderVersion' = 1,
'delta.minWriterVersion' = 4)
Should the table get (1,4)
in the end? I think it should as we automatically add all legacy features in the 3rd command.
One question. Consider the following command sequence:
CREATE TABLE x TBLPROPERTIES ( delta.feature.RemovableReaderWriterFeature = 'supported', delta.feature.ChangeDataFeedTableFeature = 'supported')
ALTER TABLE x DROP FEATURE RemovableReaderWriterFeature
ALTER TABLE x SET TBLPROPERTIES ( 'delta.minReaderVersion' = 1, 'delta.minWriterVersion' = 4)
Should the table get
(1,4)
in the end? I think it should as we automatically add all legacy features in the 3rd command.
Very good point. Command 2 will result to Protocol(3, 7, ChangeDataFeedTableFeature)
. Then, command 3 will result to Protocol(3, 7, ChangeDataFeedTableFeature + rest features in (1, 4))
. Dropping one of the existing legacy features cannot result in (1, 4) because the legacy features list won't match exactly. To get out of this and downgrade to (1, 4) the user will have to add a feature to the table and then drop it... I could address this in a separate PR in the future since there is a way out of it and it is a rare case, i.e., downgrade the protocol versions of a table with NO table features but with table feature versions.
Asking the user to set delta.minReaderVersion
and delta.minWriterVersion
is not user-friendly. This means users need to understand how to map a feature to minReaderVersion/minWriterVersion
.
Asking the user to set
delta.minReaderVersion
anddelta.minWriterVersion
is not user-friendly. This means users need to understand how to map a feature tominReaderVersion/minWriterVersion
.
True...but it's even worse if they have to do that and add each feature individually. At least this way they can go "ok, my connector supports (x, y), let me set this on the table and then downgrade.
For me it seems the existing behavior you described is better:
“For example, consider creating a table with Protocol(3, 7, RemovableReaderWriterFeature, ChangeDataFeed) and then dropping RemovableReaderWriterFeature. The resulting protocol will be Protocol(3, 7, ChangeDataFeed) instead of Protocol(1, 4)”
Protocol 3,7 besides being newer has less requirements. Meaning if you “downgrade” to 1,4 you are making your table less compatible to clients that doesn’t support all the features from writer v4, but supports v7 + CDC
There are spark master test failures:
https://github.com/delta-io/delta/actions/runs/9891350186/job/27321586977?pr=2848
[info] CreateCheckpointSuite:
[info] - commits containing adds and removes, and no previous checkpoint
[info] - commits containing adds, and no previous checkpoint
[info] - commits containing adds and removes, and a previous checkpoint created using Spark (actions/perfile): 1000000
[info] - commits containing adds and removes, and a previous checkpoint created using Spark (actions/perfile): 3
[info] - commits containing adds, and a previous checkpoint created using Spark (actions/perfile): 1000000
[info] - commits containing adds, and a previous checkpoint created using Spark (actions/perfile): 3
[info] - commits with metadata updates
[info] - commits with protocol updates *** FAILED ***
[info] == Results ==
[info]
[info] == Expected Answer - 1 ==
[info] ([2,2])
[info]
[info] == Result - 1 ==
[info] ([1,2]) (TestUtils.scala:386)
There are spark master test failures:
https://github.com/delta-io/delta/actions/runs/9891350186/job/27321586977?pr=2848
[info] CreateCheckpointSuite: [info] - commits containing adds and removes, and no previous checkpoint [info] - commits containing adds, and no previous checkpoint [info] - commits containing adds and removes, and a previous checkpoint created using Spark (actions/perfile): 1000000 [info] - commits containing adds and removes, and a previous checkpoint created using Spark (actions/perfile): 3 [info] - commits containing adds, and a previous checkpoint created using Spark (actions/perfile): 1000000 [info] - commits containing adds, and a previous checkpoint created using Spark (actions/perfile): 3 [info] - commits with metadata updates [info] - commits with protocol updates *** FAILED *** [info] == Results == [info] [info] == Expected Answer - 1 == [info] ([2,2]) [info] [info] == Result - 1 == [info] ([1,2]) (TestUtils.scala:386)
The PR is not ready to be merged. The particular failure is fixed by https://github.com/delta-io/delta/pull/3356.