datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Clickhouse ingestion column type Map schema failure

Open spudstr opened this issue 2 years ago • 4 comments

Describe the bug

clickhouse tables that have the following rows does not work vs works .... or is this not supported?

column type of Map(String, Map(String, Nullable(String))) fails to produce schema column type Map(String, Nullable(String)) produces schema as expected (and is boring).

To Reproduce in clickhouse - make a bad table

CREATE TABLE IF NOT EXISTS bugtable on cluster 'local' (
	id Int,
	metadata Map(String, Map(String, Nullable(String)))
) ENGINE = MergeTree()
order by id

simple recipe

source:
    type: clickhouse
    config:
        env: PROD
        platform_instance: clickhouse
        host_port: '${chi_cluster01_host}:8123'
        username: '${chi_cluster01_user}'
        password: '${chi_cluster01_pass}'
        profiling:
            enabled: false
        stateful_ingestion:
            enabled: true
            remove_stale_metadata: true

Expected behavior Expected behavior is to produce schema like other formats.

some logs

/tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:273: SAWarning: Did not recognize type 'name String' of column 'changes'
  warn("Did not recognize type '%s' of column '%s'" %
/tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:273: SAWarning: Did not recognize type 'previous_value String' of column 'changes'
  warn("Did not recognize type '%s' of column '%s'" %
/tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:273: SAWarning: Did not recognize type 'new_value String' of column 'changes'
  warn("Did not recognize type '%s' of column '%s'" %
/tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:273: SAWarning: Did not recognize type 'reason String' of column 'changes'
  warn("Did not recognize type '%s' of column '%s'" %
...  
  ... /tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:273: SAWarning: Did not recognize type 'Strin' of column 'metadata'

Warnings about the inability to produce schema.

 'at.audit_log_local': ["unable to get column information due to an error -> Map.__init__() missing 1 required positional argument: 'value_type'"]},

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

cli_version': '0.11.0',
 'cli_entry_location': '/tmp/datahub/ingest/venv-clickhouse-v0.11.0/lib/python3.10/site-packages/datahub/__init__.py',
 'py_version': '3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]',
 'py_exec_path': '/tmp/datahub/ingest/venv-clickhouse-v0.11.0/bin/python3',

Additional context this seems more like a sqlalchemy issue than a datahub issue :/

spudstr avatar Oct 24 '23 02:10 spudstr

Depends on https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/269

hsheth2 avatar Nov 01 '23 04:11 hsheth2

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Dec 02 '23 01:12 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Jan 01 '24 01:01 github-actions[bot]

Leaving this open for tracking, but we can't really do much until https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/269 is fixed.

hsheth2 avatar Jan 31 '24 21:01 hsheth2