OpenMetadata
OpenMetadata copied to clipboard
bad lexical cast: source type value could not be interpreted as target in Redshift Query for schema extract
Affected module Ingestion Framework
Describe the bug I encountered error 'bad lexical cast: source type value could not be interpreted as target' while ingestion framework extracting table definition.
To Reproduce
Try to create a table with following definition and extract the table using ingestion framework. -- Drop table -- DROP TABLE kw.acct; --DROP TABLE kw.acct; CREATE TABLE IF NOT EXISTS kw.acct ( acct_id VARCHAR(36) NOT NULL ENCODE RAW ,dw_start_dt DATE NOT NULL ENCODE az64 ,dw_end_dt DATE NOT NULL ENCODE az64 ,acct_name VARCHAR(1530) NOT NULL ENCODE zstd ,parnt_acct_id VARCHAR(36) NOT NULL ENCODE zstd ,top_acct_id VARCHAR(36) NOT NULL ENCODE zstd ,intrnl_ind VARCHAR(3) NOT NULL ENCODE lzo ,create_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,rec_status_cd VARCHAR(3) NOT NULL ENCODE bytedict ,curr_ind VARCHAR(3) NOT NULL ENCODE lzo ,inactv_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,del_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,dw_gen_ind VARCHAR(3) NOT NULL ENCODE zstd ,dw_mod_by VARCHAR(60) NOT NULL ENCODE zstd ,dw_mod_dts TIMESTAMP WITHOUT TIME ZONE NOT NULL ENCODE az64 ,UNIQUE (acct_id, dw_start_dt) ) DISTSTYLE KEY DISTKEY (acct_id) SORTKEY ( acct_id , curr_ind ) ; ALTER TABLE kw.acct owner to ungerpl; -- Permissions GRANT SELECT ON TABLE kw.acct TO GROUP bc_team; GRANT ALL ON TABLE kw.acct TO unpl; GRANT SELECT, DELETE, INSERT, UPDATE ON TABLE kw.acct TO GROUP dw_ce_rw; GRANT SELECT ON TABLE kw.acct TO "986(dw)"; GRANT SELECT ON TABLE kw.acct TO "871(dw)";
I believe its failing due to special character like '",(,)' in the definition of the table. But Redshift has this definition in the table if the table is shared across different databases. Can this be handled ?
Expected behavior I expected Redshift metadata ingestion worked fine and pulled metadata from Redshift
Version:
- OS: [e.g. Unix]
- Python version:3.10
- OpenMetadata version: [e.g. 1.5.6]
- OpenMetadata Ingestion package version: [e.g.
openmetadata-ingestion[docker]==1.5.6]
Additional context Below is the debug log for the issue -
[2024-10-02, 18:40:31 UTC] {topology_runner.py:166} DEBUG - Processing node producer='get_tables_name_and_type' stages=[NodeStage(type_=<class 'metadata.ingestion.models.ometa_classification.OMetaTagAndClassification'>, processor='yield_table_tag_details', nullable=True, must_return=False, overwrite=True, consumer=None, context='tags', store_all_in_context=True, clear_context=False, store_fqn=False, cache_entities=False, use_cache=False), NodeStage(type_=<class 'metadata.generated.schema.entity.data.table.Table'>, processor='yield_table', nullable=False, must_return=False, overwrite=True, consumer=['database_service', 'database', 'database_schema'], context='table', store_all_in_context=False, clear_context=False, store_fqn=False, cache_entities=False, use_cache=True), NodeStage(type_=<class 'metadata.ingestion.models.life_cycle.OMetaLifeCycleData'>, processor='yield_life_cycle_data', nullable=True, must_return=False, overwrite=True, consumer=None, context=None, store_all_in_context=False, clear_context=False, store_fqn=False, cache_entities=False, use_cache=False)] children=None post_process=None threads=False
[2024-10-02, 18:40:31 UTC] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.ingestion.models.ometa_classification.OMetaTagAndClassification'> processor='yield_table_tag_details' nullable=True must_return=False overwrite=True consumer=None context='tags' store_all_in_context=True clear_context=False store_fqn=False cache_entities=False use_cache=False
[2024-10-02, 18:40:31 UTC] {topology_runner.py:231} DEBUG - Processing stage: type_=<class 'metadata.generated.schema.entity.data.table.Table'> processor='yield_table' nullable=False must_return=False overwrite=True consumer=['database_service', 'database', 'database_schema'] context='table' store_all_in_context=False clear_context=False store_fqn=False cache_entities=False use_cache=True
[2024-10-02, 18:40:50 UTC] {status.py:91} WARNING - Unexpected exception to yield table (database=[gdw], schema=[edw], table=[acct]): (psycopg2.errors.InternalError_) bad lexical cast: source type value could not be interpreted as target
[SQL: /* {"app": "OpenMetadata", "version": "1.5.5.0"} */
SELECT
n.nspname as "schema",
c.relname as "table_name",
att.attname as "name",
format_encoding(att.attencodingtype::integer) as "encode",
format_type(att.atttypid, att.atttypmod) as "type",
att.attisdistkey as "distkey",
att.attsortkeyord as "sortkey",
att.attnotnull as "notnull",
pg_catalog.col_description(att.attrelid, att.attnum)
as "comment",
adsrc,
attnum,
pg_catalog.format_type(att.atttypid, att.atttypmod),
pg_catalog.pg_get_expr(ad.adbin, ad.adrelid) AS DEFAULT,
n.oid as "schema_oid",
c.oid as "table_oid"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n
ON n.oid = c.relnamespace
JOIN pg_catalog.pg_attribute att
ON att.attrelid = c.oid
LEFT JOIN pg_catalog.pg_attrdef ad
ON (att.attrelid, att.attnum) = (ad.adrelid, ad.adnum)
WHERE n.nspname !~ '^pg_'
AND att.attnum > 0
AND NOT att.attisdropped
AND schema = 'edw'
UNION
SELECT
view_schema as "schema",
view_name as "table_name",
col_name as "name",
null as "encode",
col_type as "type",
null as "distkey",
0 as "sortkey",
null as "notnull",
null as "comment",
null as "adsrc",
null as "attnum",
col_type as "format_type",
null as "default",
null as "schema_oid",
null as "table_oid"
FROM pg_get_late_binding_view_cols() cols(
view_schema name,
view_name name,
col_name name,
col_type varchar,
col_num int)
WHERE 1 AND schema = 'edw'
UNION
SELECT schemaname AS "schema",
tablename AS "table_name",
columnname AS "name",
null AS "encode",
-- Spectrum represents data types differently.
-- Standardize, so we can infer types.
CASE
WHEN external_type = 'int' THEN 'integer'
ELSE
replace(
replace(external_type, 'decimal', 'numeric'),
'varchar', 'character varying')
END
AS "type",
null AS "distkey",
0 AS "sortkey",
null AS "notnull",
null AS "comment",
null AS "adsrc",
null AS "attnum",
CASE
WHEN external_type = 'int' THEN 'integer'
ELSE
replace(
replace(external_type, 'decimal', 'numeric'),
'varchar', 'character varying')
END
AS "format_type",
null AS "default",
null AS "schema_oid",
null AS "table_oid"
FROM svv_external_columns
ORDER BY "schema", "table_name", "attnum";
]
(Background on this error at: https://sqlalche.me/e/14/2j85)
[2024-10-02, 18:40:50 UTC] {status.py:92} DEBUG - Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in execute_context
self.dialect.do_execute(
File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.InternalError: bad lexical cast: source type value could not be interpreted as target
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/common_db_source.py", line 496, in yield_table
) = self.get_columns_and_constraints(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner
result = func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/sql_column_handler.py", line 239, in get_columns_and_constraints
columns = inspector.get_columns(table_name, schema_name, db_name=db_name)
File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns
col_defs = self.dialect.get_columns(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner
result = func(*args, **kwargs)
File "