great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

GX fails with SAP HANA

Open Someone894 opened this issue 2 weeks ago • 0 comments

Describe the bug

Currently I am testing out how GX could integrate in our existing infrastructure. We store most of oure Bussines Data within the SAP HANA. Since GX uses SQLAlchemy and its Dialects support SAP HANA via sqlalchemy-hana I hoped that GX could work with SAP HANA too, but it throws an error when I try out the "Get started with GX and SQL" tutorial with the HANA DB. See the details below.

To Reproduce

Here is the first part of the code till the line that produces the error:

context = gx.get_context(project_root_dir=to_absolute_path("./gx_hana"))

datasource = context.sources.add_or_update_sql(
    name="PHT",
    connection_string="hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}",
)

# datasource.add_table_asset(name="POS", table_name="POS_DATA.RESULT_BY_ID")
datasource.add_query_asset(name="POS", query="SELECT * FROM POS_DATA.RESULT_BY_ID;")

batch_request = datasource.get_asset("POS").build_batch_request()

expectation_suite_name = "insert_your_expectation_suite_name_here"
context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name,
)

validator.expect_column_values_to_not_be_null(column="COUNTRY")

Where the last line is the problematic one and creates the following error:

validator.expect_column_values_to_not_be_null(column="COUNTRY")
2

Calculating Metrics:   0%|                                                                                                                                                              | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 548, in _process_direct_and_bundled_metric_computation_configurations
    ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\metric_provider.py", line 60, in inner_func
    return metric_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 73, in _sqlalchemy
    return _get_sqlalchemy_column_metadata(execution_engine, batch_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\table_metrics\table_column_types.py", line 112, in _get_sqlalchemy_column_metadata
    return get_sqlalchemy_column_metadata(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 427, in get_sqlalchemy_column_metadata
    columns = column_reflection_fallback(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\metrics\util.py", line 679, in column_reflection_fallback
    result_object = connection.execute(query)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\sql\elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 2353, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 590, in inst_expectation
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 553, in inst_expectation
    validation_result = expectation.validate(
                        ^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\expectations\expectation.py", line 1314, in validate
    ] = validator.graph_validate(
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1065, in graph_validate
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1044, in graph_validate
    ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validator.py", line 1200, in _resolve_suite_level_graph_and_process_metric_evaluation_errors
    ) = self._metrics_calculator.resolve_validation_graph(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\metrics_calculator.py", line 274, in resolve_validation_graph
    resolved_metrics, aborted_metrics_info = graph.resolve(
                                             ^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 202, in resolve
    aborted_metrics_info: _AbortedMetricsInfoDict = self._resolve(
                                                    ^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 302, in _resolve
    raise err
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\validator\validation_graph.py", line 269, in _resolve
    self._execution_engine.resolve_metrics(
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 283, in resolve_metrics
    return self._process_direct_and_bundled_metric_computation_configurations(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\great_expectation_test\.venv\Lib\site-packages\great_expectations\execution_engine\execution_engine.py", line 552, in _process_direct_and_bundled_metric_computation_configurations
    raise gx_exceptions.MetricResolutionError(
great_expectations.exceptions.exceptions.MetricResolutionError: (hdbcli.dbapi.ProgrammingError) (257, 'sql syntax error: incorrect syntax near "FROM": line 2 col 43 (at pos 53)')
[SQL: SELECT *
FROM (SELECT * FROM POS_DATA.RESULT_BY_ID FROM DUMMY) AS anon_1
WHERE true = TRUE]
(Background on this error at: https://sqlalche.me/e/20/f405)

Here is my great_expectations.yml file

config_version: 3.0

datasources: {}

config_variables_file_path: uncommitted/config_variables.yml

plugins_directory: plugins/

stores:
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    class_name: EvaluationParameterStore
  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

  profiler_store:
    class_name: ProfilerStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: profilers/

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: false
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

anonymous_usage_statistics:
  data_context_id: e41e7ae4-9958-4d80-a3b8-8668e6f1a11a
  enabled: false
fluent_datasources:
  PHT:
    type: sql
    assets:
      POS:
        type: query
        order_by: []
        batch_metadata: {}
        query: SELECT * FROM POS_DATA.RESULT_BY_ID;
    connection_string: hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}
notebooks:
include_rendered_content:
  globally: false
  expectation_suite: false
  expectation_validation_result: false

Expected behavior

I expected GX to be able to check the expectation without crashing.

Environment (please complete the following information):

  • Operating System: Windows 11 (but its only for testing and development, in production i would use docker containers on linux)
  • Great Expectations Version: 0.18.15
    • sqlalchemy: 2.0.30
    • sqlalchemy-hana: 2.1.0
  • Data Source: SAP HANA
  • Cloud environment: None

Additional context

I also went ahead and tested the SAP HANA connection (and connection string) via SQLAlchemy directly, which worked great:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("hana+hdbcli://${hana_username}:${hana_password}@${hana_host}:${hana_port}", echo=True)
pd.read_sql('SELECT * FROM "POS_DATA"."RESULT_BY_ID";', con=engine)

So im pretty sure that the SQL code GX is generating clashes with what the SAP HANA is expecting since when I run the GX generated code in the SAP HANA natively it also throws the same error:

SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE;

Could not execute 'SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID" FROM DUMMY) AS anon_1 WHERE true = TRUE'
SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "FROM": line 1 col 56 (at pos 56)

Then I changed the SQL statement to this:

SELECT * FROM (SELECT * FROM "POS_DATA"."RESULT_BY_ID") AS anon_1 WHERE true = TRUE;

and it worked out fine in the SAP HANA Studio.

Someone894 avatar Jun 17 '24 11:06 Someone894