iceberg-go icon indicating copy to clipboard operation
iceberg-go copied to clipboard

Panic writing nullable struct with required fields

Open EthanBlackburn opened this issue 8 months ago • 2 comments

Apache Iceberg version

iceberg: 1.6.1 iceberg-go: main

Please describe the bug 🐞

My schema contains a nullable struct with required fields. When I call AppendTable with the data, I get the panic

error encountered during schema visitor invalid: field says not-nullable, child #5 has nulls
stack trace: goroutine 103 [running]:
runtime/debug.Stack()
        /Users/ethanblackburn/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x64
github.com/apache/iceberg-go.VisitSchemaWithPartner[...].func1()
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1323 +0x80
panic({0x107544fc0?, 0x140012a6d60?})
        /Users/ethanblackburn/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:792 +0x124
github.com/apache/iceberg-go/table.retOrPanic[...](...)
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:683
github.com/apache/iceberg-go/table.(*arrowProjectionVisitor).Struct(0x140012a63a0, {{0x14000f1b888?, 0x109922ac0?, 0x107b2cbe0?}}, {0x107b60fe0, 0x14000e78f00}, {0x14000f57c00, 0x8, 0x14001678f00?})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:820 +0x4e0
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14000f1b888, 0x0?, 0x0?}}, {0x107b60fe0?, 0x14000e78f00}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1366 +0x360
github.com/apache/iceberg-go.visitTypeWithPartner[...]({0x107b2c9a0?, 0x14000522168?}, {0x107b60fe0?, 0x14000e78f00?}, {0x107b44d50?, 0x140012a63a0?}, {0x107b40220?, 0x14001678fc0?})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1435 +0x98
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14000e42e08, 0x0?, 0x0?}}, {0x107b60fe0?, 0x14000e78ec0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1359 +0x234
github.com/apache/iceberg-go.visitTypeWithPartner[...]({0x107b2c9a0?, 0x14000522138?}, {0x107b60fe0?, 0x14000e78ec0?}, {0x107b44d50?, 0x140012a63a0?}, {0x107b40220?, 0x14001678fc0?})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1435 +0x98
github.com/apache/iceberg-go.visitStructWithPartner[...]({{0x14001090008, 0x14000e78bc0?, 0x0?}}, {0x107b60fe0?, 0x14000e78bc0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1359 +0x234
github.com/apache/iceberg-go.VisitSchemaWithPartner[...](0x14000b91f10, {0x107b60fe0, 0x14000e78bc0}, {0x107b44d50, 0x140012a63a0}, {0x107b40220, 0x14001678fc0})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/schema.go:1335 +0xf4
github.com/apache/iceberg-go/table.ToRequestedSchema({0x107b35350, 0x109922ac0}, 0x14000b91f10, 0x14001678fc0, {0x107b5b670?, 0x14000e71e00?}, 0x0, 0x1, 0x0)
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/arrow_utils.go:893 +0x110
github.com/apache/iceberg-go/table.(*writer).writeFile(0x140009fce10, {0x107b35350, 0x109922ac0}, {{0x7a, 0x10, 0x34, 0xae, 0x69, 0x34, 0x44, ...}, ...})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/writer.go:65 +0x120
github.com/apache/iceberg-go/table.writeFiles.func3({{0x7a, 0x10, 0x34, 0xae, 0x69, 0x34, 0x44, 0x95, 0x94, 0x94, ...}, ...})
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/writer.go:126 +0x48
github.com/apache/iceberg-go/table/internal.MapExec[...].func1()
        /Users/ethanblackburn/go/pkg/mod/github.com/apache/[email protected]/table/internal/utils.go:501 +0xd0
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /Users/ethanblackburn/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x54
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
        /Users/ethanblackburn/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x94

Schema

schema:
  fields: 34
    ...
    - finding_info: type=struct<analytic: struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>, uid: utf8>
              metadata: ["iceberg.field_id": "17"]
   ...

analytic is the optional struct, analytic.type_id is the required field.

Sample data

...
finding_info (nullable=false)  len=100  nulls=0  type=struct<analytic: struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>, uid: utf8>

    analytic (nullable=true)  len=100  nulls=100  type=struct<category: utf8, desc: utf8, name: utf8, related_analytics: list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>, type: utf8, type_id: int32, uid: utf8, version: utf8>
      category (nullable=true)  len=100  nulls=100  type=utf8
      desc (nullable=true)  len=100  nulls=100  type=utf8
      name (nullable=true)  len=100  nulls=100  type=utf8
      related_analytics (nullable=true)  len=100  nulls=100  type=list<item: struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>, nullable>
        related_analytics.element (nullable=true)  len=0  nulls=0  type=struct<category: utf8, desc: utf8, name: utf8, type: utf8, type_id: int32, uid: utf8, version: utf8>
          category (nullable=true)  len=0  nulls=0  type=utf8
          desc (nullable=true)  len=0  nulls=0  type=utf8
          name (nullable=true)  len=0  nulls=0  type=utf8
          type (nullable=true)  len=0  nulls=0  type=utf8
          type_id (nullable=false)  len=0  nulls=0  type=int32
          uid (nullable=true)  len=0  nulls=0  type=utf8
          version (nullable=true)  len=0  nulls=0  type=utf8
      type (nullable=true)  len=100  nulls=100  type=utf8
      type_id (nullable=false)  len=100  nulls=100  type=int32 <--- This line triggers the panic
      uid (nullable=true)  len=100  nulls=100  type=utf8
...

Iceberg allows nullable structs with required fields, so I assume there's a bug in the validation. I think its coming from NewStructArrayWithFields but not positive.

EthanBlackburn avatar Apr 20 '25 23:04 EthanBlackburn

Thanks for filing this, looks like you're correct. It's coming from NewStructArrayWithFields, which and the ToRequestedSchema looks like it doesn't properly carry over the null bitmap in that scenario either for the parent. I've filed a PR on arrow-go (https://github.com/apache/arrow-go/pull/359) to fix this and create a new method that I'll use here in the ToRequestedSchema function. After that PR is merged on arrow-go, I'll ping you on the corresponding PR here to test and confirm that it works for you if that's okay.

zeroshade avatar Apr 21 '25 20:04 zeroshade

Thanks for the fast turnaround! Happy to test when its ready

EthanBlackburn avatar Apr 22 '25 19:04 EthanBlackburn