nodejs-bigquery
nodejs-bigquery copied to clipboard
losing precision in converting TIMESTAMP and INT64 to Javascript Number
From @c0b on October 9, 2016 5:5
https://github.com/GoogleCloudPlatform/google-cloud-node/pull/1648#issuecomment-251383826
The BigQuery TIMESTAMP has up to microseconds precision, but when converting to a JavaScript Date, it becomes up to milliseconds
https://github.com/GoogleCloudPlatform/google-cloud-node/pull/1648#issuecomment-251249300
A JavaScript Number is really only a FLOAT64, there is no real INT64, so during conversion some precision is lost:
$ node ./bigquery/queries.js sync 'SELECT ARRAY<INT64>[0x7fff1234deadbeef, -0x8000000000000000] AS example_array'
{ err: null,
rows: [ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ],
nextQuery: null,
apiResponse:
{ kind: 'bigquery#queryResponse',
schema: { fields: [ { name: 'example_array', type: 'INTEGER', mode: 'REPEATED' } ] },
jobReference: { ... },
totalRows: '1',
rows: [ { f: [ { v: [ { v: '9223110580161593071' }, { v: '-9223372036854775808' } ] } ] } ],
totalBytesProcessed: '0',
jobComplete: true,
cacheHit: false } }
Received 1 row(s)!
[ { example_array: [ 9223110580161593000, -9223372036854776000 ] } ]
I don't really have a solution, please suggest when application need this much precision
Copied from original issue: GoogleCloudPlatform/google-cloud-node#1681
Thank you for opening this, @c0b! I think we're okay on integer & timestamp precision, since it is available through the raw apiResponse
argument when the user requires it.
@callmehiphop what do you think?
From @callmehiphop on October 10, 2016 19:55
Maybe in the docs we could recommend/show examples for precision using an int64 lib? We use node-int64 in Bigtable.
From @lukesneeringer on March 13, 2017 19:41
Based on the discussion in the Node.js standup today, we are deciding this is not release blocking. A user who needs the full precision can get it from the raw API response in the third callback argument.
I don't believe there is a good solution for this without introducing complexity.
TIMESTAMP
The docs say that TIMESTAMPs are stored internally with microsecond precision, however, the raw API response seems to be returning the value in seconds.
INT64
The solution for this would be a bit more complex for the user. Currently, if you read one of these values, you get the native JS "Number" type. For our @google-cloud/spanner
API, the Spanner team wanted to guarantee the same precision the API stores, so this is how the "more complex" solution looks for that API: https://github.com/googleapis/nodejs-spanner/blob/8a8b43f52afdbcd443cdc6d6c5d0f2fa68ee5566/src/codec.js#L54-L66:
function Int(value) {
this.value = value.toString();
}
Int.prototype.valueOf = function() {
var number = Number(this.value);
if (number > Number.MAX_SAFE_INTEGER) {
throw new Error('Integer ' + this.value + ' is out of bounds.');
}
return number;
};
table.read(query, function(err, rows) {
var row = rows[0]
row = [
{
name: 'SafeInt',
value: {
value: '2' // string
}
},
{
name: 'OutOfBoundsInt',
value: {
value: '--out-of-bounds-integer--' // string
}
}
]
var safeInt = row[0].value
typeof safeInt === Spanner.Int
console.log(safeInt.value)
// '2' (String)
console.log(safeInt)
// 2 (Number type)
var outOfBoundsInt = row[1].value
typeof outOfBoundsInt === Spanner.Int
console.log(outOfBoundsInt.value)
// '--out-of-bounds-integer--' (String)
console.log(outOfBoundsInt)
// throws 'Integer '--out-of-bounds-integer-as-string--' is out of bounds.'
})
@lukesneeringer how should we determine if the precision is worth the complexity?
@alexander-fenster any input?
A pretty neat feature that the mysql
client offers is type casting. Users can essentially supply a callback to convert the raw value to whatever they want. With newer versions of Node supporting BigInt
, I think it might benefit the BigQuery library (and maybe others like Spanner and Datastore) if we offered similar functionality.
@stephenplusplus @bcoe @JustinBeckwith WDYT?
Sounds awesome to me, but please do run by @shollyman and @tswast first :)
@callmehiphop seems like a great use of BigInt
, would be interesting if you could find an elegant way to add this feature progressively so that folks can opt in to using BigInt
if they need the precision.
Type casting sounds like a good solution for this. It's actually similar to how we provide a dtypes
dictionary parameter for to_dataframe
which suffers a similar issue (nullable int64 columns can get implicitly cast to float64 in pandas DataFrames).
Timestamp precision loss is a known issue with the wire format of BQ tabledata responses. The recently-beta BQ storage api addresses this and other issues.
Would making it possible for tabledata consumers to request a different TIMESTAMP representation in tabledata responses address this (simplest is likely a string-encapsulated int64)? Or would a more complex format still be necessary?
@shollyman sounds like a great option to me.
Opened 128990665 internally as an FR for BigQuery team.
@callmehiphop, is there anything active for the client side to do after the BigQuery fixes this on the server? If not, let's close this issue.
Ping-o @callmehiphop
@stephenplusplus @callmehiphop, could this use a design similar to the work @AVaksman has been working on on datastore?
The main problem is the server doesn't even send the full precision over the wire with the tabledata.list
API. It does send the full precision with the BQ Storage API, since that uses a different serialization format (Avro or Arrow).
The internal bug (128990665) was just marked fixed not too long ago, so I'm assigning this to @steffnay to investigate what needs to be done here, if anything.
@steffnay Hey, trying to understand whether your PR #873 affects the TIMESTAMP
part of this issue.
We have tables with timestamps in microseconds, and when querying with BigQuery.query
it truncates the microseconds down to milliseconds. This screws with some of our analytics.
I think these lines in src/bigquery.ts
look quite suspicious, and I don't see how they would be affected by the changes in #873. Passing { wrapIntegers: true }
as QueryOptions
indeed wraps the integers, but does nothing for the timestamp.
Is there any way I can get a hold of my precious microseconds? We receive timestamped data in microsecond resolution from IoT units.
@m0ar Unfortunately, #873 doesn't address TIMESTAMP precision. I'm waiting for an upstream change to land before I can do the work for TIMESTAMP. I just asked for an update on the timeframe for that and hopefully it will be soon. I'll tag you here when I have more details.
Hello everybody,
After reading this issue, I tried to query a table with identifiers stored in INT64
columns with wrapIntegers
property set to true
, but it loses precision even using the latest release (5.9.3
) on Node.js 14. Could you take a look at this code, please? I do not see anything wrong.
const client = new BigQuery();
const sqlQuery = `SELECT 160864210488964916`;
const queryOptions: Query = {
query: sqlQuery,
wrapIntegers: true,
};
const [rows] = await client.query(queryOptions);
console.log(rows);
// [ { f0_: 160864210488964930 } ]
Thank you!
(I'm tempted to classify this as a "bug" but it is technically a feature request; asking @alvarowolfx to look into it.)
@sergioregueira, the behavior that you reported is indeed an issue in our library. Basically the Query.wrapIntegers
was being ignored. For it to works you need to pass as a QueryResultsOptions
.
const client = new BigQuery();
const sqlQuery = `SELECT 160864210488964916`;
const query: Query = {
query: sqlQuery,
};
const queryOptions: QueryResultsOptions = {
wrapIntegers: true,
};
const [rows] = await client.query(query, queryOptions);
console.log(rows);
// [{"f0_":{"type":"BigQueryInt","value":"160864210488964916"}}]
But I opened a PR with a fix for that to support the way that you reported too. See #1191
As for the microsecond precision support, another PR was opened to fix that ( #1192 ). It basically that uses @google-cloud/precise-date to parse timestamp arriving as a float64 to a timestamp string with microsecond resolution.
Example output:
const bigquery = new BigQuery();
const [rows] = await bigquery.query({
query: 'SELECT TIMESTAMP("2014-09-27 12:30:00.123456Z")',
});
console.log(JSON.stringify(rows));
// [{"f0_":{"value":"2014-09-27T12:30:00.123456000Z"}}]