great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Wrong casting for TIMESTAMP column type with BigQuery table

Open PaulineBeurier8 opened this issue 2 years ago • 5 comments

Describe the bug Generation of expectations using historic data does not for column type TIMESTAMP using BigQuery as source.

In BigQuery there is 2 types for date:

  • DATETIME
  • TIMESTAMP The generation of rules works properly with DATETIME type ✅ but with TIMESTAMP it fails ❌

To Reproduce Steps to reproduce the behavior:

  1. great_expectations --v3-api suite new
  2. How would you like to create your Expectation Suite -> 3. Automatically, using a profiler
  3. Select a datasource -> datasource_BigQuery_analytics
  datasource_BigQuery_analytics:
    connection_string: bigquery://my-project/my_dataset
    introspection:
      whole_table:
        data_asset_name_suffix: __whole_table
    module_name: great_expectations.datasource
    class_name: SimpleSqlalchemyDatasource 
  1. Provide: name of the table, temporary table, name of the file Do not not provide schema

  2. Error when running the JupyterNotebook cells: MetricResolutionError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]

This is due to a wrong casting of TIMESTAMP to DATETIME the column in the query, see below :

[SQL: SELECT `event_ts_utc` AS `unexpected_values` 
FROM `karhoo-scratch.data_team_sandbox.pb_ge1` 
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
 LIMIT %(param_1:INT64)s]

Expected behavior Generate expectations with no errors.

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.13.46

PaulineBeurier8 avatar Dec 16 '21 17:12 PaulineBeurier8

Hey @PaulineBeurier8 thanks for opening up this issue!

Would you mind providing the entire stack trace and letting us know which specific cell this failure is occurring at?

cdkini avatar Dec 17 '21 16:12 cdkini

The cell where the issue is appearing is:

profiler = UserConfigurableProfiler(
    profile_dataset=validator,
    excluded_expectations=None,
    ignored_columns=ignored_columns,
    not_null_only=False,
    primary_or_compound_key=False,
    semantic_types_dict=None,
    table_expectations_only=False,
    value_set_threshold="MANY",
)
suite = profiler.build_suite()

PaulineBeurier8 avatar Dec 20 '21 11:12 PaulineBeurier8

Hi @PaulineBeurier8 - thanks for this! For security purposes, would you please paste the full stack trace here or link it in a gist so we can take a look?

talagluck avatar Dec 29 '21 17:12 talagluck

Profiling: 60%
3/5 [00:52<00:33, 16.72s/it, Column=event_ts_utc]

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
    202         try:
--> 203             self._query_job.result()
    204         except google.cloud.exceptions.GoogleCloudError as exc:

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py in result(self, page_size, max_results, retry, timeout, start_index, job_retry)
   1446
-> 1447             do_get_result()
   1448

~/Documents/.venv/lib/python3.9/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    285             )
--> 286             return retry_target(
    287                 target,

~/Documents/.venv/lib/python3.9/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    188         try:
--> 189             return target()
    190

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py in do_get_result()
   1436
-> 1437                 super(QueryJob, self).result(retry=retry, timeout=timeout)
   1438

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py in result(self, retry, timeout)
    726         kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
--> 727         return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
    728

~/Documents/.venv/lib/python3.9/site-packages/google/api_core/future/polling.py in result(self, timeout, retry)
    134             # Pylint doesn't recognize that this is valid in this case.
--> 135             raise self._exception
    136

BadRequest: 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]

(job ID: 9650f44b-1437-4e02-be40-810eb424045e)

                                              -----Query Job SQL Follows-----

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT `event_ts_utc` AS `unexpected_values`
   2:FROM `my-project.my_dataset.pb_ge1`
   3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
   4: LIMIT @`param_1`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

During handling of the above exception, another exception occurred:

DatabaseError                             Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1275                 if not evt_handled:
-> 1276                     self.dialect.do_execute(
   1277                         cursor, statement, parameters, context

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    607     def do_execute(self, cursor, statement, parameters, context=None):
--> 608         cursor.execute(statement, parameters)
    609

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/_helpers.py in with_closed_check(self, *args, **kwargs)
    494                 raise exc_class(exc_msg)
--> 495             return method(self, *args, **kwargs)
    496

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in execute(self, operation, parameters, job_id, job_config)
    165         formatted_operation, parameter_types = _format_operation(operation, parameters)
--> 166         self._execute(
    167             formatted_operation, parameters, job_id, job_config, parameter_types

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
    204         except google.cloud.exceptions.GoogleCloudError as exc:
--> 205             raise exceptions.DatabaseError(exc)
    206

DatabaseError: 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]

(job ID: 9650f44b-1437-4e02-be40-810eb424045e)

                                              -----Query Job SQL Follows-----

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT `event_ts_utc` AS `unexpected_values`
   2:FROM `my-project.my_dataset.pb_ge1`
   3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
   4: LIMIT @`param_1`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |

The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py in resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    375                 try:
--> 376                     resolved_metrics[metric_to_resolve.id] = metric_fn(
    377                         **metric_provider_kwargs

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/expectations/metrics/map_metric_provider.py in _sqlalchemy_column_map_condition_values(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, **kwargs)
   2048         val.unexpected_values
-> 2049         for val in execution_engine.engine.execute(query).fetchall()
   2050     ]

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params)
   2234         connection = self._contextual_connect(close_with_result=True)
-> 2235         return connection.execute(statement, *multiparams, **params)
   2236

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
   1010         else:
-> 1011             return meth(self, multiparams, params)
   1012

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
    297         if self.supports_execution:
--> 298             return connection._execute_clauseelement(self, multiparams, params)
    299         else:

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
   1123
-> 1124         ret = self._execute_context(
   1125             dialect,

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1315         except BaseException as e:
-> 1316             self._handle_dbapi_exception(
   1317                 e, statement, parameters, cursor, context

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1509             elif should_wrap:
-> 1510                 util.raise_(
   1511                     sqlalchemy_exception, with_traceback=exc_info[2], from_=e

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
    181         try:
--> 182             raise exception
    183         finally:

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1275                 if not evt_handled:
-> 1276                     self.dialect.do_execute(
   1277                         cursor, statement, parameters, context

~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    607     def do_execute(self, cursor, statement, parameters, context=None):
--> 608         cursor.execute(statement, parameters)
    609

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/_helpers.py in with_closed_check(self, *args, **kwargs)
    494                 raise exc_class(exc_msg)
--> 495             return method(self, *args, **kwargs)
    496

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in execute(self, operation, parameters, job_id, job_config)
    165         formatted_operation, parameter_types = _format_operation(operation, parameters)
--> 166         self._execute(
    167             formatted_operation, parameters, job_id, job_config, parameter_types

~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
    204         except google.cloud.exceptions.GoogleCloudError as exc:
--> 205             raise exceptions.DatabaseError(exc)
    206

DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]

(job ID: 9650f44b-1437-4e02-be40-810eb424045e)

                                              -----Query Job SQL Follows-----

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT `event_ts_utc` AS `unexpected_values`
   2:FROM `my-project.my_dataset.pb_ge1`
   3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
   4: LIMIT @`param_1`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
[SQL: SELECT `event_ts_utc` AS `unexpected_values`
FROM `my-project.my_dataset.pb_ge1`
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
 LIMIT %(param_1:INT64)s]
[parameters: {'event_ts_utc_1': datetime.datetime(2017, 8, 21, 14, 0, 5, tzinfo=tzutc()), 'event_ts_utc_2': datetime.datetime(2021, 12, 16, 4, 55, 1, 878934, tzinfo=tzutc()), 'param_1': 20}]
(Background on this error at: http://sqlalche.me/e/13/4xp6)

During handling of the above exception, another exception occurred:

MetricResolutionError                     Traceback (most recent call last)
/var/folders/pb/9yqny_053gn3730w7y713ycw0000gn/T/ipykernel_23758/3894799140.py in <module>
      9     value_set_threshold="MANY",
     10 )
---> 11 suite = profiler.build_suite()

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in build_suite(self)
    216             expectation_suite = self._build_expectation_suite_from_semantic_types_dict()
    217         else:
--> 218             expectation_suite = self._profile_and_build_expectation_suite()
    219
    220         self._send_usage_stats_message()

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in _profile_and_build_expectation_suite(self)
    352
    353                 if data_type == "DATETIME":
--> 354                     self._build_expectations_datetime(
    355                         profile_dataset=self.profile_dataset,
    356                         column=column_name,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in _build_expectations_datetime(self, profile_dataset, column)
   1079             )
   1080             if min_value is not None or max_value is not None:
-> 1081                 profile_dataset.expect_column_values_to_be_between(
   1082                     column,
   1083                     min_value=min_value,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in inst_expectation(*args, **kwargs)
    344                     )
    345                 else:
--> 346                     raise err
    347             return validation_result
    348

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in inst_expectation(*args, **kwargs)
    302                     )
    303                 else:
--> 304                     validation_result = expectation.validate(
    305                         validator=self,
    306                         evaluation_parameters=self._expectation_suite.evaluation_parameters,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/expectations/expectation.py in validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
    825             evaluation_parameters, interactive_evaluation, data_context
    826         )
--> 827         evr = validator.graph_validate(
    828             configurations=[configuration],
    829             runtime_configuration=runtime_configuration,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in graph_validate(self, configurations, metrics, runtime_configuration)
    569                 return evrs
    570             else:
--> 571                 raise err
    572
    573         for configuration in processed_configurations:

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in graph_validate(self, configurations, metrics, runtime_configuration)
    517                 Tuple[str, str, str],
    518                 Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
--> 519             ] = self.resolve_validation_graph(
    520                 graph=graph,
    521                 metrics=metrics,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in resolve_validation_graph(self, graph, metrics, runtime_configuration)
    747                             }
    748                 else:
--> 749                     raise err
    750             except Exception as e:
    751                 if catch_exceptions:

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in resolve_validation_graph(self, graph, metrics, runtime_configuration)
    713             try:
    714                 metrics.update(
--> 715                     self._resolve_metrics(
    716                         execution_engine=self._execution_engine,
    717                         metrics_to_resolve=computable_metrics,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in _resolve_metrics(execution_engine, metrics_to_resolve, metrics, runtime_configuration)
   1628         """A means of accessing the Execution Engine's resolve_metrics method, where missing metric configurations are
   1629         resolved"""
-> 1630         return execution_engine.resolve_metrics(
   1631             metrics_to_resolve=metrics_to_resolve,
   1632             metrics=metrics,

~/Documents/.venv/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py in resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    378                     )
    379                 except Exception as e:
--> 380                     raise ge_exceptions.MetricResolutionError(
    381                         message=str(e), failed_metrics=(metric_to_resolve,)
    382                     )

MetricResolutionError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]

(job ID: 9650f44b-1437-4e02-be40-810eb424045e)

                                              -----Query Job SQL Follows-----

    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
   1:SELECT `event_ts_utc` AS `unexpected_values`
   2:FROM `my-project.my_dataset.pb_ge1`
   3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
   4: LIMIT @`param_1`
    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |    .    |
[SQL: SELECT `event_ts_utc` AS `unexpected_values`
FROM `my-project.my_dataset.pb_ge1`
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
 LIMIT %(param_1:INT64)s]
[parameters: {'event_ts_utc_1': datetime.datetime(2017, 8, 21, 14, 0, 5, tzinfo=tzutc()), 'event_ts_utc_2': datetime.datetime(2021, 12, 16, 4, 55, 1, 878934, tzinfo=tzutc()), 'param_1': 20}]
(Background on this error at: http://sqlalche.me/e/13/4xp6)

PaulineBeurier8 avatar Jan 11 '22 16:01 PaulineBeurier8

Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

github-actions[bot] avatar Aug 05 '22 02:08 github-actions[bot]