bigquery-emulator
bigquery-emulator copied to clipboard
Wrong results returned when querying a RECORD column
What happened?
Step 1: Create a table record_test
with a record column containing a numeric field
[
{
"name": "rec",
"mode": "REQUIRED",
"type": "RECORD",
"fields": [
{
"name": "value",
"mode": "REQUIRED",
"type": "NUMERIC",
}
]
}
]
Step 2: Insert a row
INSERT INTO record_test (rec) VALUES (STRUCT(47.123456789 AS value))
Step 3: Query the table using the bq
tool
SELECT * FROM record_test
+------------------------------------+
| rec |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+
SELECT rec FROM record_test
+------------------------------------+
| rec |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+
SELECT rec.* FROM record_test
+--------------+
| value |
+--------------+
| 47.123456789 |
+--------------+
SELECT rec.value FROM record_test
+--------------+
| value |
+--------------+
| 47.123456789 |
+--------------+
Selecting rec.value
with the Java BigQuery client works, but selecting rec
or *
fails with the error that 47123456789/1000000000
can't be parsed when iterating the field value lists.
What did you expect to happen?
Step 3: Query the table using the bq
tool
The following results are returned when using BigQuery instead of the emulator.
SELECT * FROM record_test
+--------------+
| rec_value |
+--------------+
| 47.123456789 |
+--------------+
SELECT rec FROM record_test
+--------------------------+
| rec |
+--------------------------+
| {"value":"47.123456789"} |
+--------------------------+
SELECT rec.* FROM record_test
+--------------+
| rec_value |
+--------------+
| 47.123456789 |
+--------------+
SELECT rec.value FROM record_test
+--------------+
| rec_value |
+--------------+
| 47.123456789 |
+--------------+
Also iterating via BigQuery client libraries should work as expected for all the statements.
Note that also the returned column names are different.
How can we reproduce it (as minimally and precisely as possible)?
Follow the steps as described above
Anything else we need to know?
No response
The behaviour for TIMESTAMP fields of records is also affected (and maybe also others)
In this case, selecting the record returns '2023-08-01T21:22:23Z' for the inserted timestamp value 2023-08-01T21:22:23.123456Z
. Educated guess is that the expected results should be a numeric timestamp value. Also the precision of the returned value is degraded.
bq
outputs for this case
SELECT * FROM record_test
+------------------------------------------------------------------------------------------------+
| rec |
+------------------------------------------------------------------------------------------------+
| {"numeric_value":"47123456789/1000000000","timestamp_value":"<date out of range for display>"} |
+------------------------------------------------------------------------------------------------+
SELECT rec.timestamp_value FROM record_test
+---------------------+
| timestamp_value |
+---------------------+
| 2023-08-01 21:22:23 |
+---------------------+
Thank you for the detailed report. I have confirmed that there are indeed differences in output compared to BigQuery, and also outputs vary between values within a structure and those that are not.
I briefly investigated the cause and would like to document it.
Internally, in go-zetasqlite which we use for constructing responses, there is a difference in the method of construction for structured and non-structured data.
For Numeric types, we convert to a string using NumericValue
's ToString
.
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L246-L251
In the case of a Struct, we first convert it to a StructValue
using StructValue
's ToStruct
.
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L297-L302
Then, we use StructValue
's Interface
to convert it to interface{}
. In this process, we recursively call Interface
for each field to convert all fields.
The definition of NumericValue
's Interface
is as follows, nv.Rat.String()
.
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/value.go#L798-L800
This indeed returns a value like 47123456789/1000000000
.
On the other hand, the definition of ToString
is as follows, which indeed returns a value like 47.123456789
.
Due to these differences, it seems that outputs within structures are different for other types as well.
I can take a look at this soon if you could assign it to me @totem3. I have a pending PR up that fixes some API-level value encoding issues similar to this one at goccy/go-zetasqlite#111
@ohaibbq
Thank you for your response. As I haven’t started working on it yet, I would be grateful if you could address this issue.
I haven't taken a look at this yet. If you have some time @totem3, that'd be appreciated. I'm currently looking at if it'd be possible to rewrite how go-zetasqlite handles window functions to fix goccy/go-zetasqlite#153 and goccy/go-zetasqlite#161 as it'd block our usage from further adoption.