bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Storage API returns records byte array containing schema bytes

Open vladDotH opened this issue 9 months ago • 1 comments

What happened?

When you try to run default storage api example with emulator, it correctly fetches schema and message, but arrow decoding always produces empty table

What did you expect to happen?

Output table data

How can we reproduce it (as minimally and precisely as possible)?

BigQuery storage API example: https://cloud.google.com/bigquery/docs/reference/storage/libraries#use

Requires custom grpc client option with emulator url:

grpcclient, err := grpc.NewClient("0.0.0.0:9060", grpc.WithTransportCredentials(insecure.NewCredentials()))

if err != nil {
    log.Fatal(err)
}

bqReadClient, err := bqStorage.NewBigQueryReadClient(
    ctx,
    option.WithGRPCConn(grpcclient)
    option.WithoutAuthentication(),
)

Anything else we need to know?

I noticed that in official example they create decoding buffer from schema array (in processArrow function) and append record batch to it

undecoded := rows.GetArrowRecordBatch().GetSerializedRecordBatch()
if len(undecoded) > 0 {
	buf = bytes.NewBuffer(schema)
	buf.Write(undecoded)
	r, err = ipc.NewReader(buf, ipc.WithAllocator(mem), ipc.WithSchema(aschema))
	//... other code
}

But in your test you don`t use schema array but only record batch:

undecoded := rows.GetArrowRecordBatch().GetSerializedRecordBatch()
if len(undecoded) > 0 {
	buf = bytes.NewBuffer(undecoded)
	r, err = ipc.NewReader(buf, ipc.WithAllocator(mem), ipc.WithSchema(aschema))
	// ... other code
}

After hours of debugging I saw that your record batches already contains schema bytes. And when I tried to use the second way with real BigQuery source, I gained:

error processing arrow: arrow/ipc: invalid message type (got=RecordBatch, want=Schema)

So it is more a question: why do you send schema bytes in batch? It is not a problem but this feature requires to do a specific conditions depending on using emulator or not.

vladDotH avatar Apr 06 '25 18:04 vladDotH

I've implemented a fix for the SerializedRecordBatch message in the Recidiviz fork of the emulator https://github.com/Recidiviz/bigquery-emulator/releases/tag/v0.6.6-recidiviz.0

ohaibbq avatar Nov 08 '25 18:11 ohaibbq