nodejs-bigquery icon indicating copy to clipboard operation
nodejs-bigquery copied to clipboard

losing precision in converting TIMESTAMP and INT64 to Javascript Number

Open stephenplusplus opened this issue 6 years ago • 20 comments

From @c0b on October 9, 2016 5:5

https://github.com/GoogleCloudPlatform/google-cloud-node/pull/1648#issuecomment-251383826

The BigQuery TIMESTAMP has up to microseconds precision, but when converting to a JavaScript Date, it becomes up to milliseconds

https://github.com/GoogleCloudPlatform/google-cloud-node/pull/1648#issuecomment-251249300

A JavaScript Number is really only a FLOAT64, there is no real INT64, so during conversion some precision is lost:

$ node ./bigquery/queries.js sync 'SELECT ARRAY<INT64>[0x7fff1234deadbeef, -0x8000000000000000] AS example_array'
{ err: null,
  rows: [ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ],
  nextQuery: null,
  apiResponse: 
   { kind: 'bigquery#queryResponse',
     schema: { fields: [ { name: 'example_array', type: 'INTEGER', mode: 'REPEATED' } ] },
     jobReference: { ... },
     totalRows: '1',
     rows: [ { f: [ { v: [ { v: '9223110580161593071' }, { v: '-9223372036854775808' } ] } ] } ],
     totalBytesProcessed: '0',
     jobComplete: true,
     cacheHit: false } }
Received 1 row(s)!
[ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ]

I don't really have a solution, please suggest when application need this much precision

Copied from original issue: GoogleCloudPlatform/google-cloud-node#1681

stephenplusplus avatar Nov 17 '17 18:11 stephenplusplus

Thank you for opening this, @c0b! I think we're okay on integer & timestamp precision, since it is available through the raw apiResponse argument when the user requires it.

@callmehiphop what do you think?

stephenplusplus avatar Nov 17 '17 18:11 stephenplusplus

From @callmehiphop on October 10, 2016 19:55

Maybe in the docs we could recommend/show examples for precision using an int64 lib? We use node-int64 in Bigtable.

stephenplusplus avatar Nov 17 '17 18:11 stephenplusplus

From @lukesneeringer on March 13, 2017 19:41

Based on the discussion in the Node.js standup today, we are deciding this is not release blocking. A user who needs the full precision can get it from the raw API response in the third callback argument.

stephenplusplus avatar Nov 17 '17 18:11 stephenplusplus

I don't believe there is a good solution for this without introducing complexity.

TIMESTAMP

The docs say that TIMESTAMPs are stored internally with microsecond precision, however, the raw API response seems to be returning the value in seconds.

INT64

The solution for this would be a bit more complex for the user. Currently, if you read one of these values, you get the native JS "Number" type. For our @google-cloud/spanner API, the Spanner team wanted to guarantee the same precision the API stores, so this is how the "more complex" solution looks for that API: https://github.com/googleapis/nodejs-spanner/blob/8a8b43f52afdbcd443cdc6d6c5d0f2fa68ee5566/src/codec.js#L54-L66:

function Int(value) {
  this.value = value.toString();
}

Int.prototype.valueOf = function() {
  var number = Number(this.value);

  if (number > Number.MAX_SAFE_INTEGER) {
    throw new Error('Integer ' + this.value + ' is out of bounds.');
  }

  return number;
};
table.read(query, function(err, rows) {
  var row = rows[0]

  row = [
    {
      name: 'SafeInt',
      value: {
        value: '2' // string
      }
    },
    {
      name: 'OutOfBoundsInt',
      value: {
        value: '--out-of-bounds-integer--' // string
      }
    }
  ]

  var safeInt = row[0].value
  typeof safeInt === Spanner.Int
  console.log(safeInt.value)
  // '2' (String)
  console.log(safeInt)
  // 2 (Number type)

  var outOfBoundsInt = row[1].value
  typeof outOfBoundsInt === Spanner.Int
  console.log(outOfBoundsInt.value)
  // '--out-of-bounds-integer--' (String)
  console.log(outOfBoundsInt)
  // throws 'Integer '--out-of-bounds-integer-as-string--' is out of bounds.'
})

@lukesneeringer how should we determine if the precision is worth the complexity?

stephenplusplus avatar Nov 17 '17 18:11 stephenplusplus

@alexander-fenster any input?

stephenplusplus avatar Apr 02 '18 13:04 stephenplusplus

A pretty neat feature that the mysql client offers is type casting. Users can essentially supply a callback to convert the raw value to whatever they want. With newer versions of Node supporting BigInt, I think it might benefit the BigQuery library (and maybe others like Spanner and Datastore) if we offered similar functionality.

@stephenplusplus @bcoe @JustinBeckwith WDYT?

callmehiphop avatar Mar 18 '19 11:03 callmehiphop

Sounds awesome to me, but please do run by @shollyman and @tswast first :)

JustinBeckwith avatar Mar 18 '19 14:03 JustinBeckwith

@callmehiphop seems like a great use of BigInt, would be interesting if you could find an elegant way to add this feature progressively so that folks can opt in to using BigInt if they need the precision.

bcoe avatar Mar 18 '19 19:03 bcoe

Type casting sounds like a good solution for this. It's actually similar to how we provide a dtypes dictionary parameter for to_dataframe which suffers a similar issue (nullable int64 columns can get implicitly cast to float64 in pandas DataFrames).

tswast avatar Mar 18 '19 23:03 tswast

Timestamp precision loss is a known issue with the wire format of BQ tabledata responses. The recently-beta BQ storage api addresses this and other issues.

Would making it possible for tabledata consumers to request a different TIMESTAMP representation in tabledata responses address this (simplest is likely a string-encapsulated int64)? Or would a more complex format still be necessary?

shollyman avatar Mar 19 '19 16:03 shollyman

@shollyman sounds like a great option to me.

callmehiphop avatar Mar 19 '19 16:03 callmehiphop

Opened 128990665 internally as an FR for BigQuery team.

shollyman avatar Mar 20 '19 17:03 shollyman

@callmehiphop, is there anything active for the client side to do after the BigQuery fixes this on the server? If not, let's close this issue.

sduskis avatar Jul 01 '19 22:07 sduskis

Ping-o @callmehiphop

stephenplusplus avatar Oct 07 '19 17:10 stephenplusplus

@stephenplusplus @callmehiphop, could this use a design similar to the work @AVaksman has been working on on datastore?

bcoe avatar Oct 07 '19 19:10 bcoe

The main problem is the server doesn't even send the full precision over the wire with the tabledata.list API. It does send the full precision with the BQ Storage API, since that uses a different serialization format (Avro or Arrow).

tswast avatar Oct 07 '19 19:10 tswast

The internal bug (128990665) was just marked fixed not too long ago, so I'm assigning this to @steffnay to investigate what needs to be done here, if anything.

meredithslota avatar Apr 13 '20 20:04 meredithslota

@steffnay Hey, trying to understand whether your PR #873 affects the TIMESTAMP part of this issue.

We have tables with timestamps in microseconds, and when querying with BigQuery.query it truncates the microseconds down to milliseconds. This screws with some of our analytics.

I think these lines in src/bigquery.ts look quite suspicious, and I don't see how they would be affected by the changes in #873. Passing { wrapIntegers: true } as QueryOptions indeed wraps the integers, but does nothing for the timestamp.

Is there any way I can get a hold of my precious microseconds? We receive timestamped data in microsecond resolution from IoT units.

m0ar avatar Nov 25 '20 15:11 m0ar

@m0ar Unfortunately, #873 doesn't address TIMESTAMP precision. I'm waiting for an upstream change to land before I can do the work for TIMESTAMP. I just asked for an update on the timeframe for that and hopefully it will be soon. I'll tag you here when I have more details.

steffnay avatar Nov 30 '20 23:11 steffnay

Hello everybody,

After reading this issue, I tried to query a table with identifiers stored in INT64 columns with wrapIntegers property set to true, but it loses precision even using the latest release (5.9.3) on Node.js 14. Could you take a look at this code, please? I do not see anything wrong.

const client = new BigQuery();

const sqlQuery = `SELECT 160864210488964916`;

const queryOptions: Query = {
  query: sqlQuery,
  wrapIntegers: true,
};

const [rows] = await client.query(queryOptions);

console.log(rows);
// [ { f0_: 160864210488964930 } ]

Thank you!

sergioregueira avatar Dec 16 '21 09:12 sergioregueira

(I'm tempted to classify this as a "bug" but it is technically a feature request; asking @alvarowolfx to look into it.)

meredithslota avatar Mar 10 '23 02:03 meredithslota

@sergioregueira, the behavior that you reported is indeed an issue in our library. Basically the Query.wrapIntegers was being ignored. For it to works you need to pass as a QueryResultsOptions.

 const client = new BigQuery();

const sqlQuery = `SELECT 160864210488964916`;

const query: Query = {
  query: sqlQuery,
};

const queryOptions: QueryResultsOptions = {
  wrapIntegers: true,
};

const [rows] = await client.query(query, queryOptions);

console.log(rows);
// [{"f0_":{"type":"BigQueryInt","value":"160864210488964916"}}]

But I opened a PR with a fix for that to support the way that you reported too. See #1191

alvarowolfx avatar Mar 10 '23 22:03 alvarowolfx

As for the microsecond precision support, another PR was opened to fix that ( #1192 ). It basically that uses @google-cloud/precise-date to parse timestamp arriving as a float64 to a timestamp string with microsecond resolution.

Example output:

const bigquery = new BigQuery();
const [rows] = await bigquery.query({
  query: 'SELECT TIMESTAMP("2014-09-27 12:30:00.123456Z")',
});
console.log(JSON.stringify(rows));
// [{"f0_":{"value":"2014-09-27T12:30:00.123456000Z"}}]

alvarowolfx avatar Mar 10 '23 22:03 alvarowolfx