bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Wrong results returned when querying a RECORD column

Open prismec opened this issue 1 year ago • 5 comments

What happened?

Step 1: Create a table record_test with a record column containing a numeric field

[
  {
    "name": "rec",
    "mode": "REQUIRED",
    "type": "RECORD",
    "fields": [
      {
        "name": "value",
        "mode": "REQUIRED",
        "type": "NUMERIC",
      }
    ]
  }
]

Step 2: Insert a row

INSERT INTO record_test (rec) VALUES (STRUCT(47.123456789 AS value))

Step 3: Query the table using the bq tool

SELECT * FROM record_test

+------------------------------------+
|                rec                 |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+

SELECT rec FROM record_test

+------------------------------------+
|                rec                 |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+

SELECT rec.* FROM record_test

+--------------+
|    value     |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec.value FROM record_test

+--------------+
|    value     |
+--------------+
| 47.123456789 |
+--------------+

Selecting rec.value with the Java BigQuery client works, but selecting rec or * fails with the error that 47123456789/1000000000 can't be parsed when iterating the field value lists.

What did you expect to happen?

Step 3: Query the table using the bq tool

The following results are returned when using BigQuery instead of the emulator.

SELECT * FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec FROM record_test

+--------------------------+
|           rec            |
+--------------------------+
| {"value":"47.123456789"} |
+--------------------------+

SELECT rec.* FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec.value FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

Also iterating via BigQuery client libraries should work as expected for all the statements.

Note that also the returned column names are different.

How can we reproduce it (as minimally and precisely as possible)?

Follow the steps as described above

Anything else we need to know?

No response

prismec avatar Jan 31 '24 09:01 prismec

The behaviour for TIMESTAMP fields of records is also affected (and maybe also others)

In this case, selecting the record returns '2023-08-01T21:22:23Z' for the inserted timestamp value 2023-08-01T21:22:23.123456Z. Educated guess is that the expected results should be a numeric timestamp value. Also the precision of the returned value is degraded.

bq outputs for this case

SELECT * FROM record_test

+------------------------------------------------------------------------------------------------+
|                                              rec                                               |
+------------------------------------------------------------------------------------------------+
| {"numeric_value":"47123456789/1000000000","timestamp_value":"<date out of range for display>"} |
+------------------------------------------------------------------------------------------------+

SELECT rec.timestamp_value FROM record_test

+---------------------+
|   timestamp_value   |
+---------------------+
| 2023-08-01 21:22:23 |
+---------------------+

prismec avatar Jan 31 '24 10:01 prismec

Thank you for the detailed report. I have confirmed that there are indeed differences in output compared to BigQuery, and also outputs vary between values within a structure and those that are not.

I briefly investigated the cause and would like to document it.

Internally, in go-zetasqlite which we use for constructing responses, there is a difference in the method of construction for structured and non-structured data.

For Numeric types, we convert to a string using NumericValue's ToString. https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L246-L251

In the case of a Struct, we first convert it to a StructValue using StructValue's ToStruct. https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L297-L302

Then, we use StructValue's Interface to convert it to interface{}. In this process, we recursively call Interface for each field to convert all fields.

The definition of NumericValue's Interface is as follows, nv.Rat.String(). https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/value.go#L798-L800 This indeed returns a value like 47123456789/1000000000.

On the other hand, the definition of ToString is as follows, which indeed returns a value like 47.123456789.

Due to these differences, it seems that outputs within structures are different for other types as well.

totem3 avatar Jan 31 '24 15:01 totem3

I can take a look at this soon if you could assign it to me @totem3. I have a pending PR up that fixes some API-level value encoding issues similar to this one at goccy/go-zetasqlite#111

ohaibbq avatar Feb 01 '24 19:02 ohaibbq

@ohaibbq

Thank you for your response. As I haven’t started working on it yet, I would be grateful if you could address this issue.

totem3 avatar Feb 01 '24 21:02 totem3

I haven't taken a look at this yet. If you have some time @totem3, that'd be appreciated. I'm currently looking at if it'd be possible to rewrite how go-zetasqlite handles window functions to fix goccy/go-zetasqlite#153 and goccy/go-zetasqlite#161 as it'd block our usage from further adoption.

ohaibbq avatar Feb 07 '24 19:02 ohaibbq