datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Kafka AVRO Schema Ingestion: Named types are not parsed completely

Open vladimirivkovic opened this issue 1 year ago • 6 comments

Describe the bug A named type is used to specify the type for multiple fields in a record.

Sample AVRO schema:

{
  "namespace": "com.sample.avro.schema",
  "type": "record",
  "name": "Pyramid",
  "fields": [
    {
      "name": "a",
      "type": {
      	"type": "record",
      	"name": "Point3D",
      	"fields": [
          { "name": "x", "type": "double" },
          { "name": "y", "type": "double" },
          { "name": "z", "type": "double" }
        ]
      }
    },
    { "name": "b", "type": "Point3D" },
    {
      "name": "c",
      "type": ["null", "Point3D"]
    },
    { "name": "d", "type": "Point3D" }
  ]
}

After running Kafka ingestion with Schema Registry enabled, the following schema is shown in DataHub: Screenshot from 2024-02-08 08-55-36

To Reproduce Steps to reproduce the behavior:

  1. Create a Kafka topic named pyramid with the AVRO schema specified above
  2. Run the UI ingestion
  3. Search for pyramid topic and open it
  4. See error in Schema tab

Expected behavior It is expected that fields b and d have the same structure as the field a - three sub-fields.

Desktop (please complete the following information):

  • OS: Ubuntu 22.04
  • Browser: firefox
  • Version 0.12.1

Additional context The same issue is noticed for both UI ingestion and custom ingestion using Python SDK and the schema_util.avro_schema_to_mce_fields method.

For this case, it seems that commenting out the else statement here solves the problem.

vladimirivkovic avatar Feb 20 '24 11:02 vladimirivkovic

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Mar 22 '24 01:03 github-actions[bot]

The same behavior was noticed while using version 0.13.0.

vladimirivkovic avatar Mar 22 '24 10:03 vladimirivkovic

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Apr 22 '24 01:04 github-actions[bot]

The same behavior was noticed while using version 0.13.1.

vladimirivkovic avatar Apr 25 '24 14:04 vladimirivkovic

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar May 26 '24 01:05 github-actions[bot]

The same behavior was noticed while using version 0.13.3.

vladimirivkovic avatar Jun 21 '24 12:06 vladimirivkovic