parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

Schema conversion fails with error "missing optional column <x>"

Open mdisibio opened this issue 3 years ago • 0 comments

After adding a new uint64 column to our go struct, we are unable to read data files written previously without this column, using parquet.Reader.Read(struct). The library detects the schema drift, and attempts to automatically convert, but fails with an unexpected error rs → Resource → Attrs → Value → row is missing optional column 2.

Was able to create a minimal reproduction in the existing conversion tests. It seems related to a schema with at least 2 levels of nesting and repeated columns. In this test, we populate an AddressBook, then try to convert it with a new field Extra string.

var conversionTests = [...]struct {
	scenario string
	from     interface{}
	to       interface{}
}{
....
	{
		scenario: "extra column on complex struct",

		from: AddressBook{
			Owner: "Julien Le Dem",
			Contacts: []Contact{
				{
					Name:        "Dmitriy Ryaboy",
					PhoneNumber: "555 987 6543",
				},
				{
					Name: "Chris Aniszczyk",
				},
			},
		},

		to: struct {
			AddressBook
			Extra string
		}{
			AddressBook: AddressBook{
				Owner: "Julien Le Dem",
				Contacts: []Contact{
					{
						Name:        "Dmitriy Ryaboy",
						PhoneNumber: "555 987 6543",
					},
					{
						Name: "Chris Aniszczyk",
					},
				},
			},
		},
	},
}
$ go test -v -run=TestConvert
=== RUN   TestConvert
=== RUN   TestConvert/convert_between_rows_which_have_the_same_schema
=== RUN   TestConvert/missing_column
=== RUN   TestConvert/missing_optional_column
=== RUN   TestConvert/missing_repeated_column
=== RUN   TestConvert/extra_column
=== RUN   TestConvert/extra_optional_column
=== RUN   TestConvert/extra_repeated_column
=== RUN   TestConvert/extra_column_on_complex_struct
    convert_test.go:156: contacts → phoneNumber → row is missing optional column 3
--- FAIL: TestConvert (0.00s)
    --- PASS: TestConvert/convert_between_rows_which_have_the_same_schema (0.00s)
    --- PASS: TestConvert/missing_column (0.00s)
    --- PASS: TestConvert/missing_optional_column (0.00s)
    --- PASS: TestConvert/missing_repeated_column (0.00s)
    --- PASS: TestConvert/extra_column (0.00s)
    --- PASS: TestConvert/extra_optional_column (0.00s)
    --- PASS: TestConvert/extra_repeated_column (0.00s)
    --- FAIL: TestConvert/extra_column_on_complex_struct (0.00s)
FAIL

mdisibio avatar Jun 22 '22 13:06 mdisibio