Storage API Avro encoding does not work with nullable fields
I've created a table with nullable columns but when I use the Storage API to stream read the table, if there are rows that don't have data in all column, the avro encoding fails with value does not match its schema: long: expected: Go numeric; received: <nil>
Example: Create Table
StandardTableDefinition.of(
Schema.of(
Field.newBuilder(ENCOUNTER_ID_COLUMN, INT64).setMode(NULLABLE).build(),
Field.newBuilder(PATIENT_ID_COLUMN, INT64).setMode(NULLABLE).build(),
Field.newBuilder(APPOINTMENT_ID_COLUMN, INT64)
.setMode(NULLABLE)
.setDefaultValueExpression("NULL")
.build(),
Field.newBuilder(ENCOUNTER_DATE_COLUMN, DATE).setMode(NULLABLE).build()
)
)
Insert Row:
RowToInsert.of(
mapOf(
ENCOUNTER_ID_COLUMN to 1234,
PATIENT_ID_COLUMN to 1234,
APPOINTMENT_ID_COLUMN to null,
ENCOUNTER_DATE_COLUMN to null
)
)
Throws: io.grpc.StatusRuntimeException: UNKNOWN: failed to encode binary from go value: cannot encode binary record "testproject.athena_dataviewer.encounters" field "APPOINTMENTID": value does not match its schema: long: expected: Go numeric; received: <nil>
It seems to be something more broad on avro with the BigQuery Java client on stream read. I have the same error, but on a non-nullable DATETIME.
Emulator returns:
UNKNOWN: failed to encode binary from go value: cannot encode binary record "<redacted>" field "changeDate": value does not match its schema: cannot encode binary union: no member schema types support datum: allowed types: [null string]; received: map[string]interface {}.
The client defines the field like this in the avro schema: {"name":"changeDate","type":["null",{"type":"string","logicalType":"datetime"}]}. Seems legit and consistent with emulator definition but still it fails.
I'm not familiar at all with go, but reading storage_handler.go, I wonder if it may lack a call to CastValue or MarshalJSON (for each field) from types/avro.go.
Maybe @goccy and/or @totem3 have an idea about this.
I've implemented and released a fix for AVRO serialization in the Recidiviz fork of the emulator https://github.com/Recidiviz/bigquery-emulator/releases/tag/v0.6.6-recidiviz.0