Panic writing nullable struct with required fields
Apache Iceberg version
iceberg: 1.6.1
iceberg-go: main
Please describe the bug 🐞
My schema contains a nullable struct with required fields. When I call AppendTable with the data, I get the panic
error encountered during schema visitor invalid: field says not-nullable, child #5 has nulls
stack trace: goroutine 103 [running]:
runtime/debug.Stack()
/Users/ethanblackburn/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x64
github.com/apache/iceberg-go.VisitSchemaWithPartner[...].func1()
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1323 +0x80
panic({0x107544fc0?, 0x140012a6d60?})
/Users/ethanblackburn/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:792 +0x124
github.com/apache/iceberg-go/table.retOrPanic[...](...)
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:683
github.com/apache/iceberg-go/table.(*arrowProjectionVisitor).Struct(0x140012a63a0, {{0x14000f1b888?, 0x109922ac0?, 0x107b2cbe0?}}, {0x107b60fe0, 0x14000e78f00}, {0x14000f57c00, 0x8, 0x14001678f00?})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:820 +0x4e0
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14000f1b888, 0x0?, 0x0?}}, {0x107b60fe0?, 0x14000e78f00}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1366 +0x360
github.com/apache/iceberg-go.visitTypeWithPartner[...]({0x107b2c9a0?, 0x14000522168?}, {0x107b60fe0?, 0x14000e78f00?}, {0x107b44d50?, 0x140012a63a0?}, {0x107b40220?, 0x14001678fc0?})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1435 +0x98
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14000e42e08, 0x0?, 0x0?}}, {0x107b60fe0?, 0x14000e78ec0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1359 +0x234
github.com/apache/iceberg-go.visitTypeWithPartner[...]({0x107b2c9a0?, 0x14000522138?}, {0x107b60fe0?, 0x14000e78ec0?}, {0x107b44d50?, 0x140012a63a0?}, {0x107b40220?, 0x14001678fc0?})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1435 +0x98
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14001090008, 0x14000e78bc0?, 0x0?}}, {0x107b60fe0?, 0x14000e78bc0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1359 +0x234
github.com/apache/iceberg-go.VisitSchemaWithPartner[...](0x14000b91f10, {0x107b60fe0, 0x14000e78bc0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1335 +0xf4
github.com/apache/iceberg-go/table.ToRequestedSchema({0x107b35350, 0x109922ac0}, 0x14000b91f10, 0x14001678fc0, {0x107b5b670?, 0x14000e71e00?}, 0x0, 0x1, 0x0)
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:893 +0x110
github.com/apache/iceberg-go/table.(*writer).writeFile(0x140009fce10, {0x107b35350, 0x109922ac0}, {{0x7a, 0x10, 0x34, 0xae, 0x69, 0x34, 0x44, ...}, ...})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/writer.go:65 +0x120
github.com/apache/iceberg-go/table.writeFiles.func3({{0x7a, 0x10, 0x34, 0xae, 0x69, 0x34, 0x44, 0x95, 0x94, 0x94, ...}, ...})
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/writer.go:126 +0x48
github.com/apache/iceberg-go/table/internal.MapExec[...].func1()
/Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/internal/utils.go:501 +0xd0
golang.org/x/sync/errgroup.(*Group).Go.func1()
/Users/ethanblackburn/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x54
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
/Users/ethanblackburn/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x94
Schema
schema:
fields: 34
...
- finding_info: type=struct<analytic: struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>, uid: utf8>
metadata: ["iceberg.field_id": "17"]
...
analytic is the optional struct, analytic.type_id is the required field.
Sample data
...
finding_info (nullable=false) len=100 nulls=0 type=struct<analytic: struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>, uid: utf8>
analytic (nullable=true) len=100 nulls=100 type=struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>
category (nullable=true) len=100 nulls=100 type=utf8
desc (nullable=true) len=100 nulls=100 type=utf8
name (nullable=true) len=100 nulls=100 type=utf8
related_analytics (nullable=true) len=100 nulls=100 type=list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>
related_analytics.element (nullable=true) len=0 nulls=0 type=struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>
category (nullable=true) len=0 nulls=0 type=utf8
desc (nullable=true) len=0 nulls=0 type=utf8
name (nullable=true) len=0 nulls=0 type=utf8
type (nullable=true) len=0 nulls=0 type=utf8
type_id (nullable=false) len=0 nulls=0 type=int32
uid (nullable=true) len=0 nulls=0 type=utf8
version (nullable=true) len=0 nulls=0 type=utf8
type (nullable=true) len=100 nulls=100 type=utf8
type_id (nullable=false) len=100 nulls=100 type=int32 <--- This line triggers the panic
uid (nullable=true) len=100 nulls=100 type=utf8
...
Iceberg allows nullable structs with required fields, so I assume there's a bug in the validation. I think its coming from NewStructArrayWithFields but not positive.
Thanks for filing this, looks like you're correct. It's coming from NewStructArrayWithFields, which and the ToRequestedSchema looks like it doesn't properly carry over the null bitmap in that scenario either for the parent. I've filed a PR on arrow-go (https://github.com/apache/arrow-go/pull/359) to fix this and create a new method that I'll use here in the ToRequestedSchema function. After that PR is merged on arrow-go, I'll ping you on the corresponding PR here to test and confirm that it works for you if that's okay.
Thanks for the fast turnaround! Happy to test when its ready