[BUG] Struct columns are returning null when querying Iceberg tables
What is the bug? Iceberg table support was recently added. When querying tables that contain struct type columns, the values of these columns is being returned as null.
{
"metadata": "null",
"time": 1717691198000,
"time_dt": "2024-06-06T16:26:38.000Z",
"cloud": "null",
"api": "null",
"dst_endpoint": "null",
"actor": "null",
"http_request": "null",
"src_endpoint": "null",
"session": "null",
"policy": "null",
"resources": "null",
"class_name": "API Activity",
"class_uid": 6003,
"category_name": "Application Activity",
"category_uid": 6,
"severity_id": 1,
"severity": "Informational",
"user": "null",
"activity_name": "Invoke",
"activity_id": 99,
"type_uid": 600399,
"type_name": "API Activity: Other",
"status": "Success",
"is_mfa": "null",
"unmapped": "{REDACTED}",
"accountid": "REDACTED",
"region": "us-east-1",
"observables": "null"
}
Schema for the table
{
"Columns": [
{
"Name": "metadata",
"Type": "struct<product:struct<version:string,name:string,vendor_name:string,feature:struct<name:string>>,uid:string,profiles:array<string>,version:string>"
},
{
"Name": "time",
"Type": "bigint"
},
{
"Name": "cloud",
"Type": "struct<region:string,provider:string>"
},
{
"Name": "api",
"Type": "struct<response:struct<error:string,message:string>,operation:string,version:string,service:struct<name:string>,request:struct<uid:string>>"
},
{
"Name": "dst_endpoint",
"Type": "struct<svc_name:string>"
},
{
"Name": "actor",
"Type": "struct<user:struct<type:string,name:string,uid:string,uuid:string,account_uid:string,credential_uid:string>,session:struct<created_time:bigint,mfa:boolean,issuer:string>,invoked_by:string,idp:struct<name:string>>"
},
{
"Name": "http_request",
"Type": "struct<user_agent:string>"
},
{
"Name": "src_endpoint",
"Type": "struct<uid:string,ip:string,domain:string>"
},
{
"Name": "resources",
"Type": "array<struct<uid:string,account_uid:string,type:string>>"
},
{
"Name": "class_name",
"Type": "string"
},
{
"Name": "class_uid",
"Type": "int"
},
{
"Name": "category_name",
"Type": "string"
},
{
"Name": "category_uid",
"Type": "int"
},
{
"Name": "severity_id",
"Type": "int"
},
{
"Name": "severity",
"Type": "string"
},
{
"Name": "user",
"Type": "struct<uid:string,uuid:string,name:string>"
},
{
"Name": "activity_name",
"Type": "string"
},
{
"Name": "activity_id",
"Type": "int"
},
{
"Name": "type_uid",
"Type": "int"
},
{
"Name": "type_name",
"Type": "string"
},
{
"Name": "status",
"Type": "string"
},
{
"Name": "status_id",
"Type": "int"
},
{
"Name": "mfa",
"Type": "boolean"
},
{
"Name": "unmapped",
"Type": "map<string,string>"
}
],
"Location": "REDACTED",
"InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
"Compressed": false,
"NumberOfBuckets": 0,
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
},
"SortColumns": [],
"Parameters": {
"partition_filtering.enabled": "true"
},
"StoredAsSubDirectories": false
}
The same behavior is not seen when querying non-Iceberg tables. In those queries, the struct fields are successfully populated in the results.
How can one reproduce the bug? Steps to reproduce the behavior:
- Create an iceberg table with struct columns
- Run a query against this table
- Validate struct columns are null
What is the expected behavior? The struct columns should be successfully populated
What is your host/environment?
- OS: [e.g. iOS]
- Version [e.g. 22]
- Plugins
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.
Test with vanilla EMR-S + Iceberg. struct filed return null value