Query response cannot be parsed when the result contains a nested timestamp
I found the bug while writing an integration test using python and test-containers, and have also confirmed behavior via CLI.
Steps to reproduce via Python (used Python 3.8.16)
from google.cloud.bigquery import Table, SchemaField, Client
# Instantiate client = Client(), using emulator
test_project = "test-project"
test_dataset = "test_dataset"
test_schema = [
SchemaField("test_timestamp_field", "TIMESTAMP", mode="REQUIRED")
]
test_table = f"{test_project}.{test_dataset}.test_table"
table = Table(test_table, schema=test_schema)
client.create_table(table)
from datetime import datetime
dtf = "%Y-%m-%d %H:%M:%S UTC"
dt1 = datetime.strptime("2021-11-29 22:00:00 UTC", dtf)
test_data = [
{
"test_timestamp_field": dt1.strftime(dtf)
}
]
client.insert_rows_json(test_table, test_data)
query1 = f"SELECT * from {test_table}"
query2 = f"SELECT STRUCT(test_timestamp_field) as nested_timestamp_field from {test_table}"
res1 = client.query(query1).result()
for r in res1:
print(r)
res2 = client.query(query2).result()
for r in res2: # An error is thrown here
print(r)
Python Error
Here is the error message when running the Python code above
ValueError: invalid literal for int() with base 10: '2021-11-29T22:00:00Z'
Steps to reproduce using CLI
docker run -p 55687:9050 -it ghcr.io/goccy/bigquery-emulator:latest --project=test-project --dataset=test_dataset
bq --api http://localhost:55687 mk --project_id test-project --schema test_timestamp_field:timestamp -t test_dataset.test_table
bq --api http://localhost:55687 query --project_id test-project --nouse_legacy_sql \
"
INSERT test_dataset.test_table(test_timestamp_field) values ('2021-11-29 22:00:00 UTC')
"
bq --api http://localhost:55687 query --project_id test-project --nouse_legacy_sql \
'SELECT
test_timestamp_field
FROM
`test-project`.test_dataset.test_table'
bq --api http://localhost:55687 query --project_id test-project \
'SELECT
STRUCT(test_timestamp_field) as nested_timestamp_field
FROM
`test-project`.test_dataset.test_table'
CLI Actual Response
Here is the response of the second query via CLI, with the nested timestamp field:
+------------------------------------------------------------+
| nested_timestamp_field |
+------------------------------------------------------------+
| {"test_timestamp_field":"<date out of range for display>"} |
+------------------------------------------------------------+
CLI Expected Response
Here is the expected response (tested w/ BQ):
+----------------------------------------+
| nested_timestamp_field |
+----------------------------------------+
| {"reserve_time":"2021-11-29 22:00:00"} |
+----------------------------------------+
An aside
I have also found that you can wrap the struct with TO_JSON_STRING, and it mostly works for CLI (although the timestamp is formatted for JSON), ~~but it does not~~ it appears to work with the Python client. Note that the timestamp lacks double quotes (which is what BQ returns in this case):
bq --api http://localhost:55687 query --project_id test-project \
'SELECT
TO_JSON_STRING(STRUCT(test_timestamp_field)) as nested_timestamp_field
FROM
`test-project`.test_dataset.test_table'
+-----------------------------------------------+
| nested_timestamp_field |
+-----------------------------------------------+
| {"test_timestamp_field":2021-11-29T22:00:00Z} |
+-----------------------------------------------+
I have also tested the above with BigQuery, and there were no errors and CLI returned responses, as expected.
Also, perhaps this is related to https://github.com/goccy/bigquery-emulator/issues/32
This is related to #265
I've fixed the TO_JSON() for the TimestampValue in this PR https://github.com/goccy/go-zetasqlite/pull/111/files#diff-a710e43b641c24f0f092d550d22fd984a9f09c62b7b4b8e41db3144c0217f592R2105-R2108