great_expectations
great_expectations copied to clipboard
Wrong casting for TIMESTAMP column type with BigQuery table
Describe the bug Generation of expectations using historic data does not for column type TIMESTAMP using BigQuery as source.
In BigQuery there is 2 types for date:
- DATETIME
- TIMESTAMP The generation of rules works properly with DATETIME type ✅ but with TIMESTAMP it fails ❌
To Reproduce Steps to reproduce the behavior:
-
great_expectations --v3-api suite new
-
How would you like to create your Expectation Suite
->3. Automatically, using a profiler
-
Select a datasource
->datasource_BigQuery_analytics
datasource_BigQuery_analytics:
connection_string: bigquery://my-project/my_dataset
introspection:
whole_table:
data_asset_name_suffix: __whole_table
module_name: great_expectations.datasource
class_name: SimpleSqlalchemyDatasource
-
Provide: name of the table, temporary table, name of the file Do not not provide schema
-
Error when running the JupyterNotebook cells:
MetricResolutionError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]
This is due to a wrong casting of TIMESTAMP to DATETIME the column in the query, see below :
[SQL: SELECT `event_ts_utc` AS `unexpected_values`
FROM `karhoo-scratch.data_team_sandbox.pb_ge1`
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
LIMIT %(param_1:INT64)s]
Expected behavior Generate expectations with no errors.
Environment (please complete the following information):
- Operating System: MacOS
- Great Expectations Version: 0.13.46
Hey @PaulineBeurier8 thanks for opening up this issue!
Would you mind providing the entire stack trace and letting us know which specific cell this failure is occurring at?
The cell where the issue is appearing is:
profiler = UserConfigurableProfiler(
profile_dataset=validator,
excluded_expectations=None,
ignored_columns=ignored_columns,
not_null_only=False,
primary_or_compound_key=False,
semantic_types_dict=None,
table_expectations_only=False,
value_set_threshold="MANY",
)
suite = profiler.build_suite()
Hi @PaulineBeurier8 - thanks for this! For security purposes, would you please paste the full stack trace here or link it in a gist so we can take a look?
Profiling: 60%
3/5 [00:52<00:33, 16.72s/it, Column=event_ts_utc]
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
202 try:
--> 203 self._query_job.result()
204 except google.cloud.exceptions.GoogleCloudError as exc:
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py in result(self, page_size, max_results, retry, timeout, start_index, job_retry)
1446
-> 1447 do_get_result()
1448
~/Documents/.venv/lib/python3.9/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
285 )
--> 286 return retry_target(
287 target,
~/Documents/.venv/lib/python3.9/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
188 try:
--> 189 return target()
190
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py in do_get_result()
1436
-> 1437 super(QueryJob, self).result(retry=retry, timeout=timeout)
1438
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py in result(self, retry, timeout)
726 kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
--> 727 return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
728
~/Documents/.venv/lib/python3.9/site-packages/google/api_core/future/polling.py in result(self, timeout, retry)
134 # Pylint doesn't recognize that this is valid in this case.
--> 135 raise self._exception
136
BadRequest: 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]
(job ID: 9650f44b-1437-4e02-be40-810eb424045e)
-----Query Job SQL Follows-----
| . | . | . | . | . | . | . | . | . | . | . |
1:SELECT `event_ts_utc` AS `unexpected_values`
2:FROM `my-project.my_dataset.pb_ge1`
3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
4: LIMIT @`param_1`
| . | . | . | . | . | . | . | . | . | . | . |
During handling of the above exception, another exception occurred:
DatabaseError Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1275 if not evt_handled:
-> 1276 self.dialect.do_execute(
1277 cursor, statement, parameters, context
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
607 def do_execute(self, cursor, statement, parameters, context=None):
--> 608 cursor.execute(statement, parameters)
609
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/_helpers.py in with_closed_check(self, *args, **kwargs)
494 raise exc_class(exc_msg)
--> 495 return method(self, *args, **kwargs)
496
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in execute(self, operation, parameters, job_id, job_config)
165 formatted_operation, parameter_types = _format_operation(operation, parameters)
--> 166 self._execute(
167 formatted_operation, parameters, job_id, job_config, parameter_types
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
204 except google.cloud.exceptions.GoogleCloudError as exc:
--> 205 raise exceptions.DatabaseError(exc)
206
DatabaseError: 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]
(job ID: 9650f44b-1437-4e02-be40-810eb424045e)
-----Query Job SQL Follows-----
| . | . | . | . | . | . | . | . | . | . | . |
1:SELECT `event_ts_utc` AS `unexpected_values`
2:FROM `my-project.my_dataset.pb_ge1`
3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
4: LIMIT @`param_1`
| . | . | . | . | . | . | . | . | . | . | . |
The above exception was the direct cause of the following exception:
DatabaseError Traceback (most recent call last)
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py in resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
375 try:
--> 376 resolved_metrics[metric_to_resolve.id] = metric_fn(
377 **metric_provider_kwargs
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/expectations/metrics/map_metric_provider.py in _sqlalchemy_column_map_condition_values(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, **kwargs)
2048 val.unexpected_values
-> 2049 for val in execution_engine.engine.execute(query).fetchall()
2050 ]
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params)
2234 connection = self._contextual_connect(close_with_result=True)
-> 2235 return connection.execute(statement, *multiparams, **params)
2236
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
1010 else:
-> 1011 return meth(self, multiparams, params)
1012
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
297 if self.supports_execution:
--> 298 return connection._execute_clauseelement(self, multiparams, params)
299 else:
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
1123
-> 1124 ret = self._execute_context(
1125 dialect,
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1315 except BaseException as e:
-> 1316 self._handle_dbapi_exception(
1317 e, statement, parameters, cursor, context
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
1509 elif should_wrap:
-> 1510 util.raise_(
1511 sqlalchemy_exception, with_traceback=exc_info[2], from_=e
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/util/compat.py in raise_(***failed resolving arguments***)
181 try:
--> 182 raise exception
183 finally:
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1275 if not evt_handled:
-> 1276 self.dialect.do_execute(
1277 cursor, statement, parameters, context
~/Documents/.venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
607 def do_execute(self, cursor, statement, parameters, context=None):
--> 608 cursor.execute(statement, parameters)
609
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/_helpers.py in with_closed_check(self, *args, **kwargs)
494 raise exc_class(exc_msg)
--> 495 return method(self, *args, **kwargs)
496
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in execute(self, operation, parameters, job_id, job_config)
165 formatted_operation, parameter_types = _format_operation(operation, parameters)
--> 166 self._execute(
167 formatted_operation, parameters, job_id, job_config, parameter_types
~/Documents/.venv/lib/python3.9/site-packages/google/cloud/bigquery/dbapi/cursor.py in _execute(self, formatted_operation, parameters, job_id, job_config, parameter_types)
204 except google.cloud.exceptions.GoogleCloudError as exc:
--> 205 raise exceptions.DatabaseError(exc)
206
DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]
(job ID: 9650f44b-1437-4e02-be40-810eb424045e)
-----Query Job SQL Follows-----
| . | . | . | . | . | . | . | . | . | . | . |
1:SELECT `event_ts_utc` AS `unexpected_values`
2:FROM `my-project.my_dataset.pb_ge1`
3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
4: LIMIT @`param_1`
| . | . | . | . | . | . | . | . | . | . | . |
[SQL: SELECT `event_ts_utc` AS `unexpected_values`
FROM `my-project.my_dataset.pb_ge1`
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
LIMIT %(param_1:INT64)s]
[parameters: {'event_ts_utc_1': datetime.datetime(2017, 8, 21, 14, 0, 5, tzinfo=tzutc()), 'event_ts_utc_2': datetime.datetime(2021, 12, 16, 4, 55, 1, 878934, tzinfo=tzutc()), 'param_1': 20}]
(Background on this error at: http://sqlalche.me/e/13/4xp6)
During handling of the above exception, another exception occurred:
MetricResolutionError Traceback (most recent call last)
/var/folders/pb/9yqny_053gn3730w7y713ycw0000gn/T/ipykernel_23758/3894799140.py in <module>
9 value_set_threshold="MANY",
10 )
---> 11 suite = profiler.build_suite()
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in build_suite(self)
216 expectation_suite = self._build_expectation_suite_from_semantic_types_dict()
217 else:
--> 218 expectation_suite = self._profile_and_build_expectation_suite()
219
220 self._send_usage_stats_message()
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in _profile_and_build_expectation_suite(self)
352
353 if data_type == "DATETIME":
--> 354 self._build_expectations_datetime(
355 profile_dataset=self.profile_dataset,
356 column=column_name,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/profile/user_configurable_profiler.py in _build_expectations_datetime(self, profile_dataset, column)
1079 )
1080 if min_value is not None or max_value is not None:
-> 1081 profile_dataset.expect_column_values_to_be_between(
1082 column,
1083 min_value=min_value,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in inst_expectation(*args, **kwargs)
344 )
345 else:
--> 346 raise err
347 return validation_result
348
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in inst_expectation(*args, **kwargs)
302 )
303 else:
--> 304 validation_result = expectation.validate(
305 validator=self,
306 evaluation_parameters=self._expectation_suite.evaluation_parameters,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/expectations/expectation.py in validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
825 evaluation_parameters, interactive_evaluation, data_context
826 )
--> 827 evr = validator.graph_validate(
828 configurations=[configuration],
829 runtime_configuration=runtime_configuration,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in graph_validate(self, configurations, metrics, runtime_configuration)
569 return evrs
570 else:
--> 571 raise err
572
573 for configuration in processed_configurations:
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in graph_validate(self, configurations, metrics, runtime_configuration)
517 Tuple[str, str, str],
518 Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
--> 519 ] = self.resolve_validation_graph(
520 graph=graph,
521 metrics=metrics,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in resolve_validation_graph(self, graph, metrics, runtime_configuration)
747 }
748 else:
--> 749 raise err
750 except Exception as e:
751 if catch_exceptions:
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in resolve_validation_graph(self, graph, metrics, runtime_configuration)
713 try:
714 metrics.update(
--> 715 self._resolve_metrics(
716 execution_engine=self._execution_engine,
717 metrics_to_resolve=computable_metrics,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/validator/validator.py in _resolve_metrics(execution_engine, metrics_to_resolve, metrics, runtime_configuration)
1628 """A means of accessing the Execution Engine's resolve_metrics method, where missing metric configurations are
1629 resolved"""
-> 1630 return execution_engine.resolve_metrics(
1631 metrics_to_resolve=metrics_to_resolve,
1632 metrics=metrics,
~/Documents/.venv/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py in resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
378 )
379 except Exception as e:
--> 380 raise ge_exceptions.MetricResolutionError(
381 message=str(e), failed_metrics=(metric_to_resolve,)
382 )
MetricResolutionError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator >= for argument types: TIMESTAMP, DATETIME. Supported signature: ANY >= ANY at [3:43]
(job ID: 9650f44b-1437-4e02-be40-810eb424045e)
-----Query Job SQL Follows-----
| . | . | . | . | . | . | . | . | . | . | . |
1:SELECT `event_ts_utc` AS `unexpected_values`
2:FROM `my-project.my_dataset.pb_ge1`
3:WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= @`event_ts_utc_1` AND `event_ts_utc` <= @`event_ts_utc_2`)
4: LIMIT @`param_1`
| . | . | . | . | . | . | . | . | . | . | . |
[SQL: SELECT `event_ts_utc` AS `unexpected_values`
FROM `my-project.my_dataset.pb_ge1`
WHERE `event_ts_utc` IS NOT NULL AND NOT (`event_ts_utc` >= %(event_ts_utc_1:DATETIME)s AND `event_ts_utc` <= %(event_ts_utc_2:DATETIME)s)
LIMIT %(param_1:INT64)s]
[parameters: {'event_ts_utc_1': datetime.datetime(2017, 8, 21, 14, 0, 5, tzinfo=tzutc()), 'event_ts_utc_2': datetime.datetime(2021, 12, 16, 4, 55, 1, 878934, tzinfo=tzutc()), 'param_1': 20}]
(Background on this error at: http://sqlalche.me/e/13/4xp6)
Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity.
It will be closed if no further activity occurs. Thank you for your contributions 🙇