great_expectations
great_expectations copied to clipboard
GX fails with SAP HANA
Describe the bug
Currently I am testing out how GX could integrate in our existing infrastructure. We store most of oure Bussines Data within the SAP HANA. Since GX uses SQLAlchemy and its Dialects support SAP HANA via sqlalchemy-hana
I hoped that GX could work with SAP HANA too, but it throws an error when I try out the "Get started with GX and SQL" tutorial with the HANA DB. See the details below.
To Reproduce
Here is the first part of the code till the line that produces the error:
context = gx.get_context(project_root_dir=to_absolute_path("./gx_hana"))
datasource = context.sources.add_or_update_sql(
name="PHT",
connection_string="hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}",
)
# datasource.add_table_asset(name="POS", table_name="POS_DATA.RESULT_BY_ID")
datasource.add_query_asset(name="POS", query="SELECT * FROM POS_DATA.RESULT_BY_ID;")
batch_request = datasource.get_asset("POS").build_batch_request()
expectation_suite_name = "insert_your_expectation_suite_name_here"
context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=expectation_suite_name,
)
validator.expect_column_values_to_not_be_null(column="COUNTRY")
Where the last line is the problematic one and creates the following error:
validator.expect_column_values_to_not_be_null(column="COUNTRY")
2
Calculating Metrics: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
cursor.execute(statement, parameters)
hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 548, in _process_direct_and_bundled_metric_computation_configurations
] = metric_computation_configuration.metric_fn( # type: ignore[misc] # F not callable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\metric_provider.py", line 60, in inner_func
return metric_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 73, in _sqlalchemy
return _get_sqlalchemy_column_metadata(execution_engine, batch_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 112, in _get_sqlalchemy_column_metadata
return get_sqlalchemy_column_metadata(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 427, in get_sqlalchemy_column_metadata
columns = column_reflection_fallback(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 679, in column_reflection_fallback
result_object = connection.execute(query)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1418, in execute
return meth(
^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 515, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1640, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1846, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1986, in _exec_single_context
self._handle_dbapi_exception(
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2353, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 590, in inst_expectation
raise err
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 553, in inst_expectation
validation_result = expectation.validate(
^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\expectation.py", line 1314, in validate
] = validator.graph_validate(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1065, in graph_validate
raise err
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1044, in graph_validate
) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1200, in _resolve_suite_level_graph_and_process_metric_evaluation_errors
) = self._metrics_calculator.resolve_validation_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\metrics_calculator.py", line 274, in resolve_validation_graph
resolved_metrics, aborted_metrics_info = graph.resolve(
^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 202, in resolve
aborted_metrics_info: _AbortedMetricsInfoDict = self._resolve(
^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 302, in _resolve
raise err
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 269, in _resolve
self._execution_engine.resolve_metrics(
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 283, in resolve_metrics
return self._process_direct_and_bundled_metric_computation_configurations(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 552, in _process_direct_and_bundled_metric_computation_configurations
raise gx_exceptions.MetricResolutionError(
great_expectations.exceptions.exceptions.MetricResolutionError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)
Here is my great_expectations.yml
file
config_version: 3.0
datasources: {}
config_variables_file_path: uncommitted/config_variables.yml
plugins_directory: plugins/
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
evaluation_parameter_store:
class_name: EvaluationParameterStore
checkpoint_store:
class_name: CheckpointStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: checkpoints/
profiler_store:
class_name: ProfilerStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: profilers/
expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store
data_docs_sites:
local_site:
class_name: SiteBuilder
show_how_to_buttons: false
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
anonymous_usage_statistics:
data_context_id: e41e7ae4-9958-4d80-a3b8-8668e6f1a11a
enabled: false
fluent_datasources:
PHT:
type: sql
assets:
POS:
type: query
order_by: []
batch_metadata: {}
query: SELECT * FROM POS_DATA.RESULT_BY_ID;
connection_string: hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}
notebooks:
include_rendered_content:
globally: false
expectation_suite: false
expectation_validation_result: false
Expected behavior
I expected GX to be able to check the expectation without crashing.
Environment (please complete the following information):
- Operating System: Windows 11 (but its only for testing and development, in production i would use docker containers on linux)
- Great Expectations Version: 0.18.15
- sqlalchemy: 2.0.30
- sqlalchemy-hana: 2.1.0
- Data Source: SAP HANA
- Cloud environment: None
Additional context
I also went ahead and tested the SAP HANA connection (and connection string) via SQLAlchemy directly, which worked great:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}", echo=True)
pd.read_sql('SELECT * FROM "POS_DATA"."RESULT_BY_ID";', con=engine)
So im pretty sure that the SQL code GX is generating clashes with what the SAP HANA is expecting since when I run the GX generated code in the SAP HANA natively it also throws the same error:
SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE;
Could not execute 'SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE'
SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "FROM": line 1 col 56 (at pos 56)
Then I changed the SQL statement to this:
SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID") AS anon_1 WHERE true = TRUE;
and it worked out fine in the SAP HANA Studio.