delta-rs
delta-rs copied to clipboard
feat: add deltaOps set metadata operation
Description
Allow for the explicit changing of the metadata of a delta table. This allows for simple schema migrations like changing the metadata of a column or adding new nullable columns. The code doesn't currently do any checks that the table would still be readable after changing the metadata. The setMetadata operation is similar to mergeSchema but doesn't require a write at the same time so it can be run and tested as part of a deployment instead of on the next write of data.
Note: you used to be able to do this by recalling DeltaOps::create with overwrite on an existing table but since that was recently fixed to delete old data this allows for recreating that original behavior.
ACTION NEEDED
delta-rs follows the Conventional Commits specification for release automation.
The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.
Unfortunately it isn't that simple. If you do it like this you could put the table in an invalid state because the metadata contains schema, partitionColumns and configuration. For each one of them you need to do many checks before you can change it.
For the configuration part I have 2 PRs open: #2264 #2075
For partitionColumns, you can't change that, at this point we don't allow evolving the partition columns of a table. And with respect to schema evolution or changes to it. That all needs to go into operations such as ALTER table DROP COLUMN, ALTER table ADD COLUMN
Unfortunately it isn't that simple. If you do it like this you could put the table in an invalid state because the metadata contains schema, partitionColumns and configuration. For each one of them you need to do many checks before you can change it.
For the configuration part I have 2 PRs open: #2264 #2075
For partitionColumns, you can't change that, at this point we don't allow evolving the partition columns of a table. And with respect to schema evolution or changes to it. That all needs to go into operations such as ALTER table DROP COLUMN, ALTER table ADD COLUMN
Thank you @ion-elgreco , I was not aware that you had added support for setting table properties with #2264. If this operation added more checking that the old and new metadata were compatible would that be acceptable? ADD COLUMN
feature would be great but is missing the ability to modify existing columns (to add nested fields to structs) that I would like to use.
@HawaiianSpork I don't see how you wouldn't be able to add a nested field in a struct column with ADD COLUMN
I think it's still safe since you add something. But probably good to verify what happens when you read two parquet with partially different struct schema
Good point, I had assumed ADD COLUMN only worked top level columns but at least in the Spark world nested columns are supported. So I guess I have to add ADD COLUMN support to delta-rs...
@HawaiianSpork fyi, I am adding an add column
operation here: https://github.com/delta-io/delta-rs/pull/2562, it will supported nested columns as well, since we leverage the schema evolution code
So will close this one