datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Use 'sqlalchemy' ingestion 'doris' error

Open junqiangge opened this issue 2 years ago • 11 comments

When I use the 'Sqlalchemy' to ingest 'Doris' metadata, many unknown exceptions appear. The MySQL connection method used to parse such as decimal type fields, comment information and table attributes are abnormal

examples:

/usr/local/Lib/python3.9/site-packages/sqlalchemy/dialects/mysql/reflection.py:192: SAWarning: Did not recognize type 'decimal128' of column 'total_pay_amt'

/usr/local/lib/python3.9/site-packages/sqlalchemy/dialects/mysql/reflection.py: 62 : SAWarning: Unknownschemacontent: 'COMMENT "XXX" '

junqiangge avatar Apr 08 '22 09:04 junqiangge

@kevinhu please, can you check this out?

treff7es avatar Apr 08 '22 16:04 treff7es

It looks like we should support decimal128 type

treff7es avatar Apr 08 '22 16:04 treff7es

Not sure what's going on with the unknown schema content warning, but this should fix the decimal128 mapping: https://github.com/datahub-project/datahub/pull/4624

kevinhu avatar Apr 08 '22 22:04 kevinhu

I think this exception is thrown by 'Sqlalchemy'. If the decimal128 type is changed, the same error will still be thrown

junqiangge avatar Apr 09 '22 14:04 junqiangge

It is thrown by sqlalchemy, but we are able to patch their type mapping this way. The warning is the same type as was raised in https://github.com/datahub-project/datahub/issues/3704

kevinhu avatar Apr 09 '22 14:04 kevinhu

The field type can be resolved normally, but the field comment resolution still fails. Some information of the table structure, such as' distributed by hash() ',' aggregate key() ',' primary key() ',' unique key() ',' 'duplicate key()' and some table information, such as' replication_num','storage_Medium' extra value attribute

junqiangge avatar Apr 10 '22 08:04 junqiangge

What do you mean by this?

Some information of the table structure, such as' distributed by hash() ',' aggregate key() ',' primary key() ',' unique key() ',' 'duplicate key()' and some table information, such as' replication_num','storage_Medium' extra value attribute

kevinhu avatar Apr 10 '22 13:04 kevinhu

What do you mean by this?

Some information of the table structure, such as' distributed by hash() ',' aggregate key() ',' primary key() ',' unique key() ',' 'duplicate key()' and some table information, such as' replication_num','storage_Medium' extra value attribute

Doris table creation statement cannot be parsed normally! Like this : image

junqiangge avatar Apr 10 '22 13:04 junqiangge

I see—why are you trying to ingest Doris with the MySQL connector though?

kevinhu avatar Apr 10 '22 16:04 kevinhu

I see—why are you trying to ingest Doris with the MySQL connector though?

OK, I see. Some Doris syntax is not supported. Thank you for your support

junqiangge avatar Apr 15 '22 16:04 junqiangge

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Sep 15 '22 06:09 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Oct 16 '22 02:10 github-actions[bot]