sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] Struct columns are returning null when querying Iceberg tables

Open engechas opened this issue 1 year ago • 1 comments

What is the bug? Iceberg table support was recently added. When querying tables that contain struct type columns, the values of these columns is being returned as null.

{
  "metadata": "null",
  "time": 1717691198000,
  "time_dt": "2024-06-06T16:26:38.000Z",
  "cloud": "null",
  "api": "null",
  "dst_endpoint": "null",
  "actor": "null",
  "http_request": "null",
  "src_endpoint": "null",
  "session": "null",
  "policy": "null",
  "resources": "null",
  "class_name": "API Activity",
  "class_uid": 6003,
  "category_name": "Application Activity",
  "category_uid": 6,
  "severity_id": 1,
  "severity": "Informational",
  "user": "null",
  "activity_name": "Invoke",
  "activity_id": 99,
  "type_uid": 600399,
  "type_name": "API Activity: Other",
  "status": "Success",
  "is_mfa": "null",
  "unmapped": "{REDACTED}",
  "accountid": "REDACTED",
  "region": "us-east-1",
  "observables": "null"
}

Schema for the table

{
            "Columns": [
                {
                    "Name": "metadata",
                    "Type": "struct<product:struct<version:string,name:string,vendor_name:string,feature:struct<name:string>>,uid:string,profiles:array<string>,version:string>"
                },
                {
                    "Name": "time",
                    "Type": "bigint"
                },
                {
                    "Name": "cloud",
                    "Type": "struct<region:string,provider:string>"
                },
                {
                    "Name": "api",
                    "Type": "struct<response:struct<error:string,message:string>,operation:string,version:string,service:struct<name:string>,request:struct<uid:string>>"
                },
                {
                    "Name": "dst_endpoint",
                    "Type": "struct<svc_name:string>"
                },
                {
                    "Name": "actor",
                    "Type": "struct<user:struct<type:string,name:string,uid:string,uuid:string,account_uid:string,credential_uid:string>,session:struct<created_time:bigint,mfa:boolean,issuer:string>,invoked_by:string,idp:struct<name:string>>"
                },
                {
                    "Name": "http_request",
                    "Type": "struct<user_agent:string>"
                },
                {
                    "Name": "src_endpoint",
                    "Type": "struct<uid:string,ip:string,domain:string>"
                },
                {
                    "Name": "resources",
                    "Type": "array<struct<uid:string,account_uid:string,type:string>>"
                },
                {
                    "Name": "class_name",
                    "Type": "string"
                },
                {
                    "Name": "class_uid",
                    "Type": "int"
                },
                {
                    "Name": "category_name",
                    "Type": "string"
                },
                {
                    "Name": "category_uid",
                    "Type": "int"
                },
                {
                    "Name": "severity_id",
                    "Type": "int"
                },
                {
                    "Name": "severity",
                    "Type": "string"
                },
                {
                    "Name": "user",
                    "Type": "struct<uid:string,uuid:string,name:string>"
                },
                {
                    "Name": "activity_name",
                    "Type": "string"
                },
                {
                    "Name": "activity_id",
                    "Type": "int"
                },
                {
                    "Name": "type_uid",
                    "Type": "int"
                },
                {
                    "Name": "type_name",
                    "Type": "string"
                },
                {
                    "Name": "status",
                    "Type": "string"
                },
                {
                    "Name": "status_id",
                    "Type": "int"
                },
                {
                    "Name": "mfa",
                    "Type": "boolean"
                },
                {
                    "Name": "unmapped",
                    "Type": "map<string,string>"
                }
            ],
            "Location": "REDACTED",
            "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
            "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
            "Compressed": false,
            "NumberOfBuckets": 0,
            "SerdeInfo": {
                "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
            },
            "SortColumns": [],
            "Parameters": {
                "partition_filtering.enabled": "true"
            },
            "StoredAsSubDirectories": false
        }

The same behavior is not seen when querying non-Iceberg tables. In those queries, the struct fields are successfully populated in the results.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Create an iceberg table with struct columns
  2. Run a query against this table
  3. Validate struct columns are null

What is the expected behavior? The struct columns should be successfully populated

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

engechas avatar Jun 06 '24 16:06 engechas

Test with vanilla EMR-S + Iceberg. struct filed return null value

penghuo avatar Jun 06 '24 21:06 penghuo