OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

bad lexical cast: source type value could not be interpreted as target in Redshift Query for schema extract

Open govardhankarne1 opened this issue 1 year ago • 0 comments

Affected module Ingestion Framework

Describe the bug I encountered error 'bad lexical cast: source type value could not be interpreted as target' while ingestion framework extracting table definition.

To Reproduce

Try to create a table with following definition and extract the table using ingestion framework. -- Drop table -- DROP TABLE kw.acct; --DROP TABLE kw.acct; CREATE TABLE IF NOT EXISTS kw.acct ( acct_id VARCHAR(36) NOT NULL ENCODE RAW ,dw_start_dt DATE NOT NULL ENCODE az64 ,dw_end_dt DATE NOT NULL ENCODE az64 ,acct_name VARCHAR(1530) NOT NULL ENCODE zstd ,parnt_acct_id VARCHAR(36) NOT NULL ENCODE zstd ,top_acct_id VARCHAR(36) NOT NULL ENCODE zstd ,intrnl_ind VARCHAR(3) NOT NULL ENCODE lzo ,create_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,rec_status_cd VARCHAR(3) NOT NULL ENCODE bytedict ,curr_ind VARCHAR(3) NOT NULL ENCODE lzo ,inactv_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,del_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,dw_gen_ind VARCHAR(3) NOT NULL ENCODE zstd ,dw_mod_by VARCHAR(60) NOT NULL ENCODE zstd ,dw_mod_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,UNIQUE (acct_id, dw_start_dt) ) DISTSTYLE KEY DISTKEY (acct_id) SORTKEY ( acct_id , curr_ind ) ; ALTER TABLE kw.acct owner to ungerpl; -- Permissions GRANT SELECT ON TABLE kw.acct TO GROUP bc_team; GRANT ALL ON TABLE kw.acct TO unpl; GRANT SELECT, DELETE, INSERT, UPDATE ON TABLE kw.acct TO GROUP dw_ce_rw; GRANT SELECT ON TABLE kw.acct TO "986(dw)"; GRANT SELECT ON TABLE kw.acct TO "871(dw)";

I believe its failing due to special character like '",(,)' in the definition of the table. But Redshift has this definition in the table if the table is shared across different databases. Can this be handled ?

Expected behavior I expected Redshift metadata ingestion worked fine and pulled metadata from Redshift

Version:

  • OS: [e.g. Unix]
  • Python version:3.10
  • OpenMetadata version: [e.g. 1.5.6]
  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==1.5.6]

Additional context Below is the debug log for the issue -

[2024-10-02, 18:40:31 UTC] {topology_runner.py:166} DEBUG - Processing node producer='get_tables_name_and_type' stages=[NodeStage(type_=<class 'metadata.ingestion.models.ometa_classification.OMetaTagAndClassification'>, processor='yield_table_tag_details', nullable=True, must_return=False, overwrite=True, consumer=None, context='tags', store_all_in_context=True, clear_context=False, store_fqn=False, cache_entities=False, use_cache=False), NodeStage(type_=<class 'metadata.generated.schema.entity.data.table.Table'>, processor='yield_table', nullable=False, must_return=False, overwrite=True, consumer=['database_service', 'database', 'database_schema'], context='table', store_all_in_context=False, clear_context=False, store_fqn=False, cache_entities=False, use_cache=True), NodeStage(type_=<class 'metadata.ingestion.models.life_cycle.OMetaLifeCycleData'>, processor='yield_life_cycle_data', nullable=True, must_return=False, overwrite=True, consumer=None, context=None, store_all_in_context=False, clear_context=False, store_fqn=False, cache_entities=False, use_cache=False)] children=None post_process=None threads=False [2024-10-02, 18:40:31 UTC] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.ingestion.models.ometa_classification.OMetaTagAndClassification'> processor='yield_table_tag_details' nullable=True must_return=False overwrite=True consumer=None context='tags' store_all_in_context=True clear_context=False store_fqn=False cache_entities=False use_cache=False [2024-10-02, 18:40:31 UTC] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.generated.schema.entity.data.table.Table'> processor='yield_table' nullable=False must_return=False overwrite=True consumer=['database_service', 'database', 'database_schema'] context='table' store_all_in_context=False clear_context=False store_fqn=False cache_entities=False use_cache=True [2024-10-02, 18:40:50 UTC] {status.py:91} WARNING - Unexpected exception to yield table (database=[gdw], schema=[edw], table=[acct]): (psycopg2.errors.InternalError_) bad lexical cast: source type value could not be interpreted as target [SQL: /* {"app": "OpenMetadata", "version": "1.5.5.0"} */ SELECT n.nspname as "schema", c.relname as "table_name", att.attname as "name", format_encoding(att.attencodingtype::integer) as "encode", format_type(att.atttypid, att.atttypmod) as "type", att.attisdistkey as "distkey", att.attsortkeyord as "sortkey", att.attnotnull as "notnull", pg_catalog.col_description(att.attrelid, att.attnum) as "comment", adsrc, attnum, pg_catalog.format_type(att.atttypid, att.atttypmod), pg_catalog.pg_get_expr(ad.adbin, ad.adrelid) AS DEFAULT, n.oid as "schema_oid", c.oid as "table_oid" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace JOIN pg_catalog.pg_attribute att ON att.attrelid = c.oid LEFT JOIN pg_catalog.pg_attrdef ad ON (att.attrelid, att.attnum) = (ad.adrelid, ad.adnum) WHERE n.nspname !~ '^pg_' AND att.attnum > 0 AND NOT att.attisdropped AND schema = 'edw' UNION SELECT view_schema as "schema", view_name as "table_name", col_name as "name", null as "encode", col_type as "type", null as "distkey", 0 as "sortkey", null as "notnull", null as "comment", null as "adsrc", null as "attnum", col_type as "format_type", null as "default", null as "schema_oid", null as "table_oid" FROM pg_get_late_binding_view_cols() cols( view_schema name, view_name name, col_name name, col_type varchar, col_num int) WHERE 1 AND schema = 'edw' UNION SELECT schemaname AS "schema", tablename AS "table_name", columnname AS "name", null AS "encode", -- Spectrum represents data types differently. -- Standardize, so we can infer types. CASE WHEN external_type = 'int' THEN 'integer' ELSE replace( replace(external_type, 'decimal', 'numeric'), 'varchar', 'character varying') END AS "type", null AS "distkey", 0 AS "sortkey", null AS "notnull", null AS "comment", null AS "adsrc", null AS "attnum", CASE WHEN external_type = 'int' THEN 'integer' ELSE replace( replace(external_type, 'decimal', 'numeric'), 'varchar', 'character varying') END AS "format_type", null AS "default", null AS "schema_oid", null AS "table_oid" FROM svv_external_columns ORDER BY "schema", "table_name", "attnum"; ] (Background on this error at: https://sqlalche.me/e/14/2j85) [2024-10-02, 18:40:50 UTC] {status.py:92} DEBUG - Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in execute_context self.dialect.do_execute( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) psycopg2.errors.InternalError: bad lexical cast: source type value could not be interpreted as target The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/common_db_source.py", line 496, in yield_table ) = self.get_columns_and_constraints( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner result = func(*args, **kwargs) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/sql_column_handler.py", line 239, in get_columns_and_constraints columns = inspector.get_columns(table_name, schema_name, db_name=db_name) File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns col_defs = self.dialect.get_columns( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner result = func(*args, **kwargs) File "", line 2, in get_columns File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache ret = fn(self, con, *args, **kw) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/redshift/utils.py", line 61, in get_columns cols = self._get_redshift_columns(connection, table_name, schema, **kw) File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy_redshift/dialect.py", line 840, in _get_redshift_columns all_schema_columns = self._get_schema_column_info( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner result = func(*args, **kwargs) File "", line 2, in _get_schema_column_info File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache ret = fn(self, con, *args, **kw) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/redshift/utils.py", line 137, in _get_schema_column_info result = connection.execute( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1370, in execute return self._exec_driver_sql( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1674, in _exec_driver_sql ret = self._execute_context( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context self.handle_dbapi_exception( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in handle_dbapi_exception util.raise( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise raise exception File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in execute_context self.dialect.do_execute( File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.InternalError: (psycopg2.errors.InternalError) bad lexical cast: source type value could not be interpreted as target

govardhankarne1 avatar Oct 14 '24 15:10 govardhankarne1