QUERY --format=json incorrectly double-escapes native JSON column types
What happened?
When using a simple bq query with the --format=json flag, the emulator incorrectly serializes the native BigQuery JSON data type column as a double-escaped string.
This differs from production BigQuery behavior, where a native JSON column should be outputted as a raw, unquoted JSON object within the surrounding JSON structure. This breakage prevents automated pipelines from correctly unmarshaling the data.
Expected vs. Actual Behavior
Source Table Schema:
(id INT64, metadata JSON)Source Data:
{"id": 1, "metadata": {"status": "ok", "version": 1}}
What did you expect to happen?
Expected (Production BigQuery) | {"metadata": {"status": "ok", "version": 1}} (Nested JSON Object)
Actual (BigQuery Emulator) | {"metadata": "{\"status\":\"ok\",\"version\":1}"} (Double-escaped JSON String)
How can we reproduce it (as minimally and precisely as possible)?
Reproduction Steps (Using bq CLI)
Please ensure the goccy/bigquery-emulator is running on port 9050.
Define Environment Variables (Needed if running from host/client)
export BQ_EMULATOR_HOST="[http://0.0.0.0:9050](http://0.0.0.0:9050)" PROJECT_ID="test-project-1" DATASET_ID="test_ds" TABLE_NAME="test_table_json"Create Table with Native
JSONColumnbq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID mk --table $DATASET_ID.$TABLE_NAME
'id:INT64, metadata:JSON'Insert Row with JSON Data
bq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query "INSERT INTO $DATASET_ID.$TABLE_NAME (id, metedata) VALUES ( 1, JSON {"status": "ok", "version": 1} );"Verify the Broken Output using Simple
QUERYbq --api $BQ_EMULATOR_HOST --project_id $PROJECT_ID query --format=json
"SELECT * FROM $DATASET_ID.$TABLE_NAME"
Observed Output (Faulty)
The output from the query command shows the metadata field as an escaped string:
[
{
"id": "1",
"metadata": "{"status":"ok","version":1}"
}
]
Expected Output (Correct)
The expected output for a correct JSON serialization should contain the nested object:
[
{
"id": "1",
"metadata": {
"status": "ok",
"version": 1
}
}
]
Anything else we need to know?
No response
Hi @orlevyhs, are you sure this is incorrect behavior? Running the query in the BigQuery console and with the bq CLI shows the following:
query.txt
CREATE TEMP TABLE table_1 (id STRING, data JSON);
INSERT INTO table_1 (id, data) VALUES ("1", JSON '{"status": "OK", "version": 1}');
SELECT * FROM table_1;
$ cat query.txt | bq --project_id project query --format=json --nouse_legacy_sql
Waiting on bqjob_r53b26c8307df493f_0000019a3c0ef9b4_1 ... (2s) Current status: DONE
["Created project._script6d6f9283d61f775202c818dc4d14871a2f6ae86d.table_1\n"
,"Number of affected rows: 1\n"
,[{"data":"{\"status\":\"OK\",\"version\":1}","id":"1"}]
]
When using the Python client for this query, the client decodes the JSON string into a Python dict.
Hey @ohaibbq!
Thanks for the fast reply!
I appreciate the clarification on the bq CLI --format=json behavior; I initially mentioned the CLI because I noticed the same incorrect serialization pattern there while debugging the core issue with the file output. It seems the problem is not with the general query response format.
My concern, and the original reason for filing this issue, is the output of the BigQuery EXTRACT job when writing the native JSON type to Google Cloud Storage (GCS). I'm unsure if the emulator officially supports exporting data, but I noticed you have tests covering this functionality: tests on that.
Context: Our service uses the Go client's standard extraction pattern to generate raw export files from the BQ table:
bqClient.Dataset(dsID).Table(tableName).ExtractorTo(gcsRef)
Failure Point: In the emulator, the resulting raw file contains the JSON column as a double-escaped string.
Impact: Our Go service is designed to parse the raw file output directly into a struct where the JSON field is a map (map[string]any). It fails because the emulator outputs a string.
Is there any chance you're running into #335? Could you try replicating this against the Recidiviz fork which has #335 merged? https://github.com/Recidiviz/bigquery-emulator/releases/tag/v0.4.4-recidiviz.26
Yes it is! I will try to use his fork until you merge it Thank you!
No problem :) I maintain that fork. Hopefully we'll get the repositories consolidated soon.