airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Normalization: handle nested objects correctly

Open edgao opened this issue 2 years ago • 10 comments

We're currently not prioritizing objects/arrays correctly in some cases. E.g. v0 schema:

{"type": ["string", "object"], "properties": {}}

which becomes v1 schema:

{
  "oneOf": [
    {"$ref": "....String"},
    {"type": "object", "properties": {}}
  ]
}

is being interpreted as a string, rather than object.

edgao avatar Feb 01 '23 01:02 edgao

/test connector=bases/base-normalization

:clock2: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4059883773 :x: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4059883773 :bug: https://gradle.com/s/vultcb7cr2i6g

Build Failed

Test summary info:

Could not find result summary

edgao avatar Feb 01 '23 01:02 edgao

/test connector=connectors/destination-snowflake

:clock2: connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4059884430

edgao avatar Feb 01 '23 01:02 edgao

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

❌ Destinations (48)

Connector Version Changelog Publish
destination-aws-datalake 0.1.1
destination-azure-blob-storage 0.1.6
destination-bigquery 1.2.13
destination-bigquery-denormalized 1.2.12
(diff seed version)
destination-cassandra 0.1.4
destination-clickhouse 0.2.2
(changelog missing)
destination-clickhouse-strict-encrypt 0.2.2 🔵
(ignored)
🔵
(ignored)
destination-csv 1.0.0
(changelog missing)
destination-databricks 0.3.1
destination-dev-null 0.2.7 🔵
(ignored)
🔵
(ignored)
destination-doris 0.1.0
destination-dynamodb 0.1.7
destination-e2e-test 0.2.4
destination-elasticsearch 0.1.6
destination-elasticsearch-strict-encrypt 0.1.6 🔵
(ignored)
🔵
(ignored)
destination-gcs 0.2.14
destination-iceberg 0.1.0
destination-jdbc 0.3.14 🔵
(ignored)
🔵
(ignored)
destination-kafka 0.1.10
destination-keen 0.2.4
destination-kinesis 0.1.5
destination-local-json 0.2.11
destination-mariadb-columnstore 0.1.7
destination-mongodb 0.1.9
destination-mongodb-strict-encrypt 0.1.9 🔵
(ignored)
🔵
(ignored)
destination-mqtt 0.1.3
destination-mssql 0.1.22
destination-mssql-strict-encrypt 0.1.22 🔵
(ignored)
🔵
(ignored)
destination-mysql 0.1.20
destination-mysql-strict-encrypt 0.1.21
(mismatch: 0.1.20)
🔵
(ignored)
🔵
(ignored)
destination-oracle 0.1.19
destination-oracle-strict-encrypt 0.1.19 🔵
(ignored)
🔵
(ignored)
destination-postgres 0.3.26
destination-postgres-strict-encrypt 0.3.26 🔵
(ignored)
🔵
(ignored)
destination-pubsub 0.2.0
destination-pulsar 0.1.3
destination-r2 0.1.0
destination-redis 0.1.4
destination-redpanda 0.1.0
destination-redshift 0.3.56
destination-rockset 0.1.4
destination-s3 0.3.19
destination-s3-glue 0.1.1
destination-scylla 0.1.3
destination-snowflake 0.4.47
destination-teradata 0.1.0
destination-tidb 0.1.0
destination-yugabytedb 0.1.0
  • See "Actionable Items" below for how to resolve warnings and errors.

👀 Other Modules (1)

  • base-normalization

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

github-actions[bot] avatar Feb 01 '23 01:02 github-actions[bot]

/test connector=connectors/destination-snowflake

:clock2: connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4060985917 :x: connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4060985917 :bug: https://gradle.com/s/2z5e47lzjszoc

Build Failed

Test summary info:

Could not find result summary

edgao avatar Feb 01 '23 04:02 edgao

/test connector=bases/base-normalization

:clock2: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4061054215 :x: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4061054215 :bug: https://gradle.com/s/vjpolbvhtnyjq

Build Failed

Test summary info:

	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:148: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:136: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:136: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 FAILED integration_tests/test_drop_scd_overwrite.py::test_reset_scd_on_overwrite[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_ephemeral.py::test_destination_supported_limits[DestinationType.REDSHIFT-1000]
	 FAILED integration_tests/test_ephemeral.py::test_destination_failure_over_limits[Redshift-1665-target lists can have at most 1664 entries]
	 FAILED integration_tests/test_ephemeral.py::test_empty_streams[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_ephemeral.py::test_stream_with_1_airbyte_column[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.CLICKHOUSE-test_simple_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.REDSHIFT-test_simple_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.REDSHIFT-test_nested_streams]
	 FAILED integration_tests/test_normalization.py::test_redshift_normalization_migration
	 [31m============ [31m[1m9 failed[0m, [32m39 passed[0m, [33m14 skipped[0m[31m in 2826.38s (0:47:06)[0m[31m =============[0m

edgao avatar Feb 01 '23 04:02 edgao

/test connector=bases/base-normalization

:clock2: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4066371566 :x: bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/4066371566 :bug: https://gradle.com/s/qnnk6qqyfnbhg

Build Failed

Test summary info:

	 =========================== short test summary info ============================
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.ORACLE does not support incremental sync with schema change yet
	 SKIPPED [1] integration_tests/test_drop_scd_overwrite.py:56: DestinationType.TIDB does not support incremental sync with schema change yet
	 SKIPPED [3] integration_tests/test_ephemeral.py:102: ephemeral materialization isn't supported in ClickHouse yet
	 SKIPPED [1] integration_tests/test_ephemeral.py:59: Skipping test for column limit, because in MySQL, the max number of columns is limited by row size (8KB)
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.CLICKHOUSE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:148: DestinationType.MSSQL is disabled as it doesnt fully support schema change in incremental yet
	 SKIPPED [2] integration_tests/test_normalization.py:136: DestinationType.MYSQL does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:82: Destinations DestinationType.ORACLE does not support nested streams
	 SKIPPED [1] integration_tests/test_normalization.py:136: DestinationType.ORACLE does not support incremental yet
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.SNOWFLAKE is disabled as it doesnt support schema change in incremental yet (column type changes)
	 SKIPPED [1] integration_tests/test_normalization.py:145: DestinationType.TIDB is disabled as it doesnt support schema change in incremental yet (column type changes)
	 FAILED integration_tests/test_drop_scd_overwrite.py::test_reset_scd_on_overwrite[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_ephemeral.py::test_destination_supported_limits[DestinationType.REDSHIFT-1000]
	 FAILED integration_tests/test_ephemeral.py::test_destination_failure_over_limits[Redshift-1665-target lists can have at most 1664 entries]
	 FAILED integration_tests/test_ephemeral.py::test_empty_streams[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_ephemeral.py::test_stream_with_1_airbyte_column[DestinationType.REDSHIFT]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.CLICKHOUSE-test_simple_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.REDSHIFT-test_nested_streams]
	 FAILED integration_tests/test_normalization.py::test_normalization[DestinationType.REDSHIFT-test_simple_streams]
	 FAILED integration_tests/test_normalization.py::test_redshift_normalization_migration
	 [31m============ [31m[1m9 failed[0m, [32m39 passed[0m, [33m14 skipped[0m[31m in 2871.11s (0:47:51)[0m[31m =============[0m

edgao avatar Feb 01 '23 16:02 edgao

/test connector=connectors/destination-snowflake

:clock2: connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/4066372962

edgao avatar Feb 01 '23 16:02 edgao

/test connector=connectors/destination-bigquery

:clock2: connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/4066372256

edgao avatar Feb 01 '23 16:02 edgao

/test connector=connectors/destination-snowflake

edgao avatar Feb 01 '23 17:02 edgao

/test connector=connectors/destination-bigquery

:clock2: connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/4066822068 :white_check_mark: connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/4066822068 Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                            2      0   100%
normalization/transform_catalog/reserved_keywords.py                 14      0   100%
normalization/transform_catalog/__init__.py                           2      0   100%
normalization/destination_type.py                                    14      0   100%
normalization/data_type.py                                           14      0   100%
normalization/__init__.py                                             4      0   100%
normalization/transform_catalog/destination_name_transformer.py     166      8    95%
normalization/transform_catalog/table_name_registry.py              174     34    80%
normalization/transform_catalog/utils.py                             74     17    77%
normalization/transform_config/transform.py                         189     48    75%
normalization/transform_catalog/dbt_macro.py                         22      7    68%
normalization/transform_catalog/catalog_processor.py                155     86    45%
normalization/transform_catalog/transform.py                         61     38    38%
normalization/transform_catalog/stream_processor.py                 626    432    31%
-------------------------------------------------------------------------------------
TOTAL                                                              1517    670    56%

Build Passed

Test summary info:

All Passed

edgao avatar Feb 01 '23 17:02 edgao

Tried to run tests locally, but got lots of fails :(

Don't see how I may attach full logs

Selection_265

etsybaev avatar Feb 01 '23 20:02 etsybaev

:airbyte-integrations:bases:base-normalization:mypyCheck task seems to fail

etsybaev avatar Feb 01 '23 20:02 etsybaev

@edgao I'm going though stale PRs assigned to the team - what's the status of this one?

evantahler avatar Apr 01 '23 01:04 evantahler

this was for protocol v1; closing

edgao avatar Apr 03 '23 14:04 edgao