great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

[BUG] Expectation type expect_column_values_to_match_regex fails for databricks-sqlalchemy

Open satniks opened this issue 5 months ago • 6 comments

We are using latest GX (v 0.18.10) with databricks based on python packages sqlalchemy-databricks and databricks-sql-connector.

All our GX rules which worked fine with databricks/sqlalchemy except the rule for regular expression. We get following error in the validation result indicating there is no support for databricks for regex rules using sqlalchemy.

"Regex is not supported for dialect {_dialect.dialect.name!s} "\nAttributeError: 'DatabricksDialect' object has no attribute 'dialect'

satniks avatar Feb 28 '24 06:02 satniks

As GX team is not accepting pull requests, I am providing the fix here. The code block is exactly same as that of snowflake. The check for databricks is using dialect.name. May be there is better approach for this check which I am not aware.

File Name: great_expectations/expectations/metrics/util.py Function Name: get_dialect_regex_expression() New Code to be added: (preferably after snowflake block)

try:
    # Databricks
    if dialect.name == 'databricks':
        if positive:
            return sqlalchemy.BinaryExpression(
                column, sqlalchemy.literal(regex), sqlalchemy.custom_op("REGEXP")
            )
        else:
            return sqlalchemy.BinaryExpression(
                column,
                sqlalchemy.literal(regex),
                sqlalchemy.custom_op("NOT REGEXP"),
            )
except (
    AttributeError,
    TypeError,
):  # TypeError can occur if the driver was not installed and so is None
    pass

satniks avatar Feb 28 '24 09:02 satniks

One of my test developers also has this problem; we are using databricks extensively in our company, and hoped to use this to automate much of the testing. Although it's a small thing to do some polymorphism, overriding, or just good old fashioned works-on-my-machine changes, it would be best for us not to have to maintain additional code behind a specific version of the library.

@Kilo59 is there any ability to re-vist an MR/PR for this? This has merit and certainly impacts test driven organizations using databricks as a primary hub for storage, access, and datalake aggregation. @satniks Thanks for the write up!

immerautumn avatar Mar 19 '24 14:03 immerautumn

@immerautumn @satniks Where did you hear we aren't accepting Pull Requests? We are preparing for a major v1.0 release, so we aren't doing weekly releases and aren't as focused on bug fixes like this.

However, as far as I know, we are still accepting pull requests. I would expect a fix for this to be accepted.

Kilo59 avatar Mar 19 '24 15:03 Kilo59

@Kilo59 I was just parroting what I read above. Worked out for all the good though! @satniks If you feel inclined, send them a PR; I know we would appreciate it highly.

immerautumn avatar Mar 19 '24 16:03 immerautumn

@immerautumn , submitted the PR https://github.com/great-expectations/great_expectations/pull/9641

@Kilo59 , the contribute section of the README file says "We’re temporarily pausing the acceptance of new pull requests (PRs)." and therefore I did not create the PR. https://github.com/great-expectations/great_expectations?tab=readme-ov-file#contribute

satniks avatar Mar 20 '24 05:03 satniks

@Kilo59 , any plan to fix this issue (using this pull request or any other way) for v1.0?

satniks avatar Apr 25 '24 10:04 satniks