parquet-go
parquet-go copied to clipboard
Schema conversion fails with error "missing optional column <x>"
After adding a new uint64 column to our go struct, we are unable to read data files written previously without this column, using parquet.Reader.Read(struct). The library detects the schema drift, and attempts to automatically convert, but fails with an unexpected error rs → Resource → Attrs → Value → row is missing optional column 2.
Was able to create a minimal reproduction in the existing conversion tests. It seems related to a schema with at least 2 levels of nesting and repeated columns. In this test, we populate an AddressBook, then try to convert it with a new field Extra string.
var conversionTests = [...]struct {
scenario string
from interface{}
to interface{}
}{
....
{
scenario: "extra column on complex struct",
from: AddressBook{
Owner: "Julien Le Dem",
Contacts: []Contact{
{
Name: "Dmitriy Ryaboy",
PhoneNumber: "555 987 6543",
},
{
Name: "Chris Aniszczyk",
},
},
},
to: struct {
AddressBook
Extra string
}{
AddressBook: AddressBook{
Owner: "Julien Le Dem",
Contacts: []Contact{
{
Name: "Dmitriy Ryaboy",
PhoneNumber: "555 987 6543",
},
{
Name: "Chris Aniszczyk",
},
},
},
},
},
}
$ go test -v -run=TestConvert
=== RUN TestConvert
=== RUN TestConvert/convert_between_rows_which_have_the_same_schema
=== RUN TestConvert/missing_column
=== RUN TestConvert/missing_optional_column
=== RUN TestConvert/missing_repeated_column
=== RUN TestConvert/extra_column
=== RUN TestConvert/extra_optional_column
=== RUN TestConvert/extra_repeated_column
=== RUN TestConvert/extra_column_on_complex_struct
convert_test.go:156: contacts → phoneNumber → row is missing optional column 3
--- FAIL: TestConvert (0.00s)
--- PASS: TestConvert/convert_between_rows_which_have_the_same_schema (0.00s)
--- PASS: TestConvert/missing_column (0.00s)
--- PASS: TestConvert/missing_optional_column (0.00s)
--- PASS: TestConvert/missing_repeated_column (0.00s)
--- PASS: TestConvert/extra_column (0.00s)
--- PASS: TestConvert/extra_optional_column (0.00s)
--- PASS: TestConvert/extra_repeated_column (0.00s)
--- FAIL: TestConvert/extra_column_on_complex_struct (0.00s)
FAIL