delta icon indicating copy to clipboard operation
delta copied to clipboard

Issue with UPDATE statement for tables with GENERATED columns

Open stawo opened this issue 3 years ago • 4 comments

I have a Delta table with a timestamp column which is defined as GENERATED ALWAYS AS (now()).
In my understanding, this allows, when inserting new records to the table, to automatically assign the current timestamp to the column.

When I'm trying to use the UPDATE command to update, for a given row, another field (not the aforementioned generated column), I receive an InvariantViolationException error referring to the column.
I find it confusing, as the UPDATE command should only touch the other field.
Am I misinterpreting the documentation? (https://docs.databricks.com/delta/delta-update.html#update-a-table&language-sql)

Minimal code to reproduce the issue:

CREATE or replace TABLE `test_generated_column_update` (
    test_string STRING NOT NULL,
    test_date TIMESTAMP GENERATED ALWAYS AS (now())
    )
    USING DELTA
;

insert into `test_generated_column_update`(test_string) values ('Hello!');

select * from `test_generated_column_update`;

update `test_generated_column_update` set test_string = 'Good bye!' where test_string == 'Hello!';

select * from `test_generated_column_update`;

Error: Error in SQL statement: InvariantViolationException: CHECK constraint Generated Column (test_date <=> now()) violated by row with values:

Tested on Azure Databricks on a single node cluster using Databricks Runtime 9.1

stawo avatar Sep 29 '21 20:09 stawo

@stawo Thanks for reporting this. It's a mistake that we don't block now expression to use in a generation expression. We require A generation expression can use any SQL functions in Spark that always return the same result when given the same argument values. See https://docs.delta.io/latest/delta-batch.html#use-generated-columns

zsxwing avatar Sep 29 '21 20:09 zsxwing

@zsxwing , thanks for the clarification. I also was a little bit suspicious as I read as well that part regarding the supported functions. Hope you can add the block soon to avoid confusion and I'll remove the generated statement from my column 😄

stawo avatar Sep 30 '21 09:09 stawo

Hi @zsxwing When we will get a fix for this ?

akhilesh2186 avatar Jun 10 '22 16:06 akhilesh2186

@akhilesh2186 we don't plan to fix this. It's an oversight that we put now() to the allowlist. In general, we don't want to allow non deterministic functions like this. The fix would be blocking now() but considering it's a breaking change, it's not worth and we will just leave it as it is.

zsxwing avatar Jun 14 '22 23:06 zsxwing

Closing this old ticket. This has been answered.

zsxwing avatar Aug 26 '22 05:08 zsxwing

Hi @zsxwing , I know this thread is going back some time, but I'm curious what the rationale behind not allowing non-deterministic functions is? We are currently using now() to accomplish this but certainly don't want to have a lot depending on something that could break at some point (though I recognize you said there's currently no plan to block now())

themattmorris avatar Jun 30 '23 14:06 themattmorris