bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

QUERY --format=json incorrectly double-escapes native JSON column types

Open orlevyhs opened this issue 2 months ago • 5 comments

What happened?

When using a simple bq query with the --format=json flag, the emulator incorrectly serializes the native BigQuery JSON data type column as a double-escaped string.

This differs from production BigQuery behavior, where a native JSON column should be outputted as a raw, unquoted JSON object within the surrounding JSON structure. This breakage prevents automated pipelines from correctly unmarshaling the data.

Expected vs. Actual Behavior

  • Source Table Schema: (id INT64, metadata JSON)

  • Source Data: {"id": 1, "metadata": {"status": "ok", "version": 1}}

Scenario | Output of metadata Field

What did you expect to happen?

Expected (Production BigQuery) | {"metadata": {"status": "ok", "version": 1}} (Nested JSON Object)
Actual (BigQuery Emulator) | {"metadata": "{\"status\":\"ok\",\"version\":1}"} (Double-escaped JSON String)

How can we reproduce it (as minimally and precisely as possible)?

Reproduction Steps (Using bq CLI)

Please ensure the goccy/bigquery-emulator is running on port 9050.

  1. Define Environment Variables (Needed if running from host/client)

    export BQ_EMULATOR_HOST="[http://0.0.0.0:9050](http://0.0.0.0:9050)"
    PROJECT_ID="test-project-1"
    DATASET_ID="test_ds"
    TABLE_NAME="test_table_json"
    

  2. Create Table with Native JSON Column

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID mk --table $DATASET_ID.$TABLE_NAME 
    'id:INT64, metadata:JSON'

  3. Insert Row with JSON Data

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query "INSERT INTO $DATASET_ID.$TABLE_NAME (id, metedata)
    VALUES (
    1,
    JSON {"status": "ok", "version": 1}
    );"
    

  4. Verify the Broken Output using Simple QUERY

    bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query --format=json  
    "SELECT * FROM $DATASET_ID.$TABLE_NAME"

Observed Output (Faulty)

The output from the query command shows the metadata field as an escaped string:

[
{
"id": "1",
"metadata": "{"status":"ok","version":1}"
}
]

Expected Output (Correct)

The expected output for a correct JSON serialization should contain the nested object:

[
{
"id": "1",
"metadata": {
"status": "ok",
"version": 1
}
}
]

Anything else we need to know?

No response

orlevyhs avatar Oct 30 '25 12:10 orlevyhs

Hi @orlevyhs, are you sure this is incorrect behavior? Running the query in the BigQuery console and with the bq CLI shows the following:

query.txt

CREATE TEMP TABLE table_1 (id STRING, data JSON);
INSERT INTO table_1 (id, data) VALUES ("1", JSON '{"status": "OK", "version": 1}');
SELECT * FROM table_1;
Image
$ cat query.txt | bq --project_id project query --format=json --nouse_legacy_sql
Waiting on bqjob_r53b26c8307df493f_0000019a3c0ef9b4_1 ... (2s) Current status: DONE   
["Created project._script6d6f9283d61f775202c818dc4d14871a2f6ae86d.table_1\n"
,"Number of affected rows: 1\n"
,[{"data":"{\"status\":\"OK\",\"version\":1}","id":"1"}]
]

When using the Python client for this query, the client decodes the JSON string into a Python dict.

ohaibbq avatar Oct 31 '25 21:10 ohaibbq

Hey @ohaibbq!

Thanks for the fast reply!

I appreciate the clarification on the bq CLI --format=json behavior; I initially mentioned the CLI because I noticed the same incorrect serialization pattern there while debugging the core issue with the file output. It seems the problem is not with the general query response format.

My concern, and the original reason for filing this issue, is the output of the BigQuery EXTRACT job when writing the native JSON type to Google Cloud Storage (GCS). I'm unsure if the emulator officially supports exporting data, but I noticed you have tests covering this functionality: tests on that.

Context: Our service uses the Go client's standard extraction pattern to generate raw export files from the BQ table:

bqClient.Dataset(dsID).Table(tableName).ExtractorTo(gcsRef)

Failure Point: In the emulator, the resulting raw file contains the JSON column as a double-escaped string.

Impact: Our Go service is designed to parse the raw file output directly into a struct where the JSON field is a map (map[string]any). It fails because the emulator outputs a string.

orlevyhs avatar Nov 02 '25 05:11 orlevyhs

Is there any chance you're running into #335? Could you try replicating this against the Recidiviz fork which has #335 merged? https://github.com/Recidiviz/bigquery-emulator/releases/tag/v0.4.4-recidiviz.26

ohaibbq avatar Nov 02 '25 17:11 ohaibbq

Yes it is! I will try to use his fork until you merge it Thank you!

orlevyhs avatar Nov 02 '25 17:11 orlevyhs

No problem :) I maintain that fork. Hopefully we'll get the repositories consolidated soon.

ohaibbq avatar Nov 02 '25 17:11 ohaibbq