databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

[sqlalchemy] [Feature Request] Support default values

Open dhirschfeld opened this issue 1 year ago • 6 comments

It appears that default values aren't supported? https://github.com/databricks/databricks-sql-python/blob/3f6834c9797503132cb0d1b9b770acc36cd22d42/src/databricks/sqlalchemy/base.py#L64

Trying to create a schema with the below server_default gives a spark error:

    insert_timestamp: Mapped[datetime] = mapped_column(
        sa.DateTime(timezone=True),
        init=False,
        nullable=False,
        server_default=sa.func.current_timestamp(),
    )
DatabaseError: (databricks.sql.exc.ServerOperationError) [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
[SQL: 
CREATE TABLE test.`ModelMetadata` (
	pkid BIGINT GENERATED ALWAYS AS IDENTITY, 
	name STRING NOT NULL, 
	version STRING NOT NULL, 
	insert_timestamp TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP NOT NULL, 
	PRIMARY KEY (pkid)
) USING DELTA

]
(Background on this error at: https://sqlalche.me/e/20/4xp6)

Edit:

It looks like the internal error is created by having a TIMESTAMP_NTZ type with the CURRENT_TIMESTAMP default. Using the correct TIMESTAMP type for the column then gives the table feature was not enabled error.

To work around the latter error requires setting the table properties, which AFAICS can't be done from sqlalchemy, at least not directly.

dhirschfeld avatar Jan 22 '24 11:01 dhirschfeld

try to use

sa.func.now()

wibbico avatar Jan 22 '24 11:01 wibbico

Thanks to this thread I ws able to find out the magic syntax:

CREATE TABLE test.`ModelMetadata` (
	pkid BIGINT GENERATED ALWAYS AS IDENTITY, 
	name STRING NOT NULL, 
	version STRING NOT NULL, 
	insert_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, 
	PRIMARY KEY (pkid)
) USING DELTA
TBLPROPERTIES(
  'delta.minReaderVersion' = '1',
  'delta.minWriterVersion'='7',
  'delta.feature.allowColumnDefaults' = 'enabled'
);

...but I don't really know how to integrate that with sqlalchemy.

If that is required for defaults then I guess this projects needs to implement a custom compiler to add the required TBLPROPERTIES?

dhirschfeld avatar Jan 22 '24 11:01 dhirschfeld

With sa.func.now() sqlalchemy emits the below SQL:

CREATE TABLE test.`ModelMetadata` (
	pkid BIGINT GENERATED ALWAYS AS IDENTITY, 
	name STRING NOT NULL, 
	version STRING NOT NULL, 
	insert_timestamp TIMESTAMP DEFAULT now() NOT NULL, 
	PRIMARY KEY (pkid)
) USING DELTA;

...which gives the below error:

[WRONG_COLUMN_DEFAULTS_FOR_DELTA_FEATURE_NOT_ENABLED]
Failed to execute CREATE TABLE command because it assigned a column DEFAULT value, but the corresponding table feature was not enabled.
Please retry the command again after executing ALTER TABLE tableName SET TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'supported').

dhirschfeld avatar Jan 22 '24 11:01 dhirschfeld

If that is required for defaults then I guess this projects needs to implement a custom compiler to add the required TBLPROPERTIES?

Or, even better, just enable allowColumnDefaults by default so it Just Works without having to touch the sqlalchemy compiler.

dhirschfeld avatar Jan 22 '24 11:01 dhirschfeld

In my specific case it appears I can hack around it with:

@event.listens_for(Base.metadata, "after_create")
def set_default_insert_timestamp(target, connection, **kw):
    for table in target.tables:
        connection.execute(DDL(f"""
        ALTER TABLE {table} SET TBLPROPERTIES(
            'delta.feature.allowColumnDefaults' = 'enabled'
        );
        """))
        connection.execute(DDL(f"""
        ALTER TABLE {table}
        ALTER COLUMN insert_timestamp
        SET DEFAULT CURRENT_TIMESTAMP;
        """))

...but that's a bit gross 🤢

It would be nice for the standard sqlalchemy server_default to Just Work with no hacks required (by the end-user!)

dhirschfeld avatar Jan 22 '24 12:01 dhirschfeld

Hey folks thanks for the write-up and debugging! I'm pleased to see that you worked out how to hack this together using event.listens_for. The sqlalchemy events system is a powerful tool, since nearly every behaviour of a given dialect can be patched in this way. I agree that this belongs in the dialect itself.

@dhirschfeld would you shoot an email to [email protected] referencing this GitHub issue so we can track it and add it to the pipeline for enhancements to the dialect?

susodapop avatar Jan 22 '24 18:01 susodapop