bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Storage API Avro encoding does not work with nullable fields

Open ianb-pomelo opened this issue 2 years ago • 2 comments

I've created a table with nullable columns but when I use the Storage API to stream read the table, if there are rows that don't have data in all column, the avro encoding fails with value does not match its schema: long: expected: Go numeric; received: <nil>

Example: Create Table

StandardTableDefinition.of(
  Schema.of(
    Field.newBuilder(ENCOUNTER_ID_COLUMN, INT64).setMode(NULLABLE).build(),
    Field.newBuilder(PATIENT_ID_COLUMN, INT64).setMode(NULLABLE).build(),
    Field.newBuilder(APPOINTMENT_ID_COLUMN, INT64)
      .setMode(NULLABLE)
      .setDefaultValueExpression("NULL")
      .build(),
    Field.newBuilder(ENCOUNTER_DATE_COLUMN, DATE).setMode(NULLABLE).build()
  )
)

Insert Row:

RowToInsert.of(
  mapOf(
    ENCOUNTER_ID_COLUMN to 1234,
    PATIENT_ID_COLUMN to 1234,
    APPOINTMENT_ID_COLUMN to null,
    ENCOUNTER_DATE_COLUMN to null
  )
)

Throws: io.grpc.StatusRuntimeException: UNKNOWN: failed to encode binary from go value: cannot encode binary record "testproject.athena_dataviewer.encounters" field "APPOINTMENTID": value does not match its schema: long: expected: Go numeric; received: <nil>

ianb-pomelo avatar Dec 06 '23 16:12 ianb-pomelo

It seems to be something more broad on avro with the BigQuery Java client on stream read. I have the same error, but on a non-nullable DATETIME.

Emulator returns:

UNKNOWN: failed to encode binary from go value: cannot encode binary record "<redacted>" field "changeDate": value does not match its schema: cannot encode binary union: no member schema types support datum: allowed types: [null string]; received: map[string]interface {}.

The client defines the field like this in the avro schema: {"name":"changeDate","type":["null",{"type":"string","logicalType":"datetime"}]}. Seems legit and consistent with emulator definition but still it fails.

I'm not familiar at all with go, but reading storage_handler.go, I wonder if it may lack a call to CastValue or MarshalJSON (for each field) from types/avro.go.

Maybe @goccy and/or @totem3 have an idea about this.

turb avatar Jan 16 '25 09:01 turb

I've implemented and released a fix for AVRO serialization in the Recidiviz fork of the emulator https://github.com/Recidiviz/bigquery-emulator/releases/tag/v0.6.6-recidiviz.0

ohaibbq avatar Nov 08 '25 18:11 ohaibbq