tedious icon indicating copy to clipboard operation
tedious copied to clipboard

[RFC] Better Data Types

Open arthurschreiber opened this issue 6 years ago • 9 comments

inspired by issues like https://github.com/tediousjs/tedious/issues/608, https://github.com/tediousjs/tedious/issues/675, https://github.com/tediousjs/tedious/issues/487, https://github.com/tediousjs/tedious/issues/474, https://github.com/tediousjs/tedious/issues/163, https://github.com/tediousjs/tedious/issues/339, #1058 and comments made by @chdh over at https://github.com/tediousjs/tedious/pull/674, I want to propose the following changes to the data type system to make it more robust and extensible.

These will be breaking changes, so I'm trying to collect some feedback first. 😅

What's currently wrong with data types?

Many (but not all) of the data types supported in SQL Server do not map 1:1 to JavaScript types. This causes a lot of friction in the form of precision loss or conversion failures. E.g. numeric/decimal (depending on their precision and scale), bigint, and many of the date and time formats, cannot be converted to/from JavaScript without a loss of precision or information.

There are JavaScript libraries (big-number, bigdecimal, moment, js-joda), that implement custom versions of compatible types in JavaScript itself, but I don't want to add these libraries as dependencies of tedious and "force" our users to use them, especially because there are often many competing libraries for the same use-case, and it's not always clear which one is the "best".

How do we fix this?

I want data types to be pluggable with custom encoder/decoder functions. A encoder function would be responsible for converting a JavaScript object into it's TDS binary representation, and the decoder function would be responsible for converting the binary representation back into a JavaScript object.

tedious would ship with a default set of encoder and decoder functions. Data Types which can be mapped to JavaScript types directly would be fully supported out-of the box, while all other data types will come with default implementations that will encode from and decode to plain JavaScript strings. This would ensure that tedious can work with all data-types out of the box, without loss of information.

Separate plug-in libraries can then provide more specific conversion functions.

Fantasy API

Here's a very rought example of how this could look like:

const tedious = require('tedious');
const BigInt = tedious.TYPES.BigInt;
const BigInteger = require("big-integer");

const MAX_BIGINT = new BigInteger('9223372036854775807')
const MIN_BIGINT = new BigInteger('-9223372036854775808')

tedious.TYPES.BigInt.defineEncoder(function(value: any) : Buffer {
  if (!(value instanceof BigInteger) || !value.isFinite() || value.lt(MIN_BIGINT) || value.gt(MAX_BIGINT)) {
    // The given value is either not a BigInteger or out of bounds.
    throw new tedious.DataTypeError();
  }

  const result = new Buffer.alloc(8);
  // ... write the value to the buffer
  return result;
});

BigInt.defineDecoder(function(data: Buffer) {
  // ... convert the given buffer back to a BigInt and return it here
});

Other changes

While we're at it, I'd like to propose another change to how the data types API works and is used. I think we should no longer pass data type conversion errors to the Request callback, but instead throw the error right when request.addParameter is called. This would improve the error stack trace (it will point to exactly where the incorrect data type value was passed) and would allow us to clean some of the tedious internals to not have to hold onto these errors. It would also allow us to change the internal storage of parameters - we'd no longer have to store the original value for the parameter, just the Buffer result of the conversion. This'd allow V8 to perform better optimizations, as we'd move from a data structure with a megamorphic value property to one where value is always a buffer.

arthurschreiber avatar Jan 10 '18 08:01 arthurschreiber

(nitpick)

const result = new Buffer.alloc(8);

should probably be:

const result = Buffer.alloc(8);

(carry on)

rossipedia avatar Jan 10 '18 17:01 rossipedia

Given the little I see it seems to make more sense to limit scope and try mapping many types to the fewer supported in JavaScript (rather than the opposite: supporting more types and features). The statement "cannot be converted to/from JavaScript without a loss of precision or information" would seem to suggest more time and effort than is currently expended on this project and version. Since there are 104 open issues some more than 2 years old, 21 open pull requests and 46 branches I'd suggest time constraints and quality as more relevant to this project.

jimmont avatar May 24 '18 02:05 jimmont

Is this being worked on? This is a very important issue for projects involving decimal values, and I was thinking about starting a prototype for this RFC.

fer22f avatar Feb 12 '19 15:02 fer22f

An example of where this would help is in the datetimeoffset type, which AFAIK right now just uses the local timezone of the server when fed a Javascript date... which completely defeats the point of having a datetimeoffset value in the first place.

ethankale avatar Apr 20 '20 22:04 ethankale

@IanChokS any updates so far?

KAMAELUA avatar Oct 12 '21 13:10 KAMAELUA

Hi @KAMAELUA , Unfortunately, there is no progress for now. The task is currently in the backlog and there is no one working on it yet.

MichaelSun90 avatar Oct 12 '21 18:10 MichaelSun90

This feature would be absolutely lovely. we're working on a full rewrite of our Data Types in Sequelize (https://github.com/sequelize/sequelize/pull/14505) and the way Tedious currently represents some values in JS leads to unavoidable precision loss:

  • datetimeoffset (and friends), and time are parsed as a JS Date which discards precision past the millisecond (precision 3, SQL server supports precision up to 7)
  • decimal is parsed as a JS number. Retrieving 9007199254740993 from the db results in 9007199254740994
  • date (with no time) is also parsed as a JS Date. It doesn't lead to a loss of precision but it still creates an intermediary object we always convert back to string (because we don't believe Date is appropriate for dateonly, and we're preparing the arrival of Temporal).

We'd prefer to receive strings for all of the above types. Being able to customize their parsing would enable us to fix this on our end.

ephys avatar Sep 24 '22 18:09 ephys

This feature would be absolutely lovely. we're working on a full rewrite of our Data Types in Sequelize (sequelize/sequelize#14505) and the current way Tedious represents some values in JS leads to unavoidable precision loss:

  • datetimeoffset (and friends), and time are parsed as a JS Date which discards precision past the millisecond (precision 3, SQL server supports precision up to 7)
  • decimal is parsed a JS number. Retrieving 9007199254740993 from the db results in 9007199254740994
  • date (with no time) is also parsed as a JS Date. It doesn't lead to a loss of precision but it still creates an intermediary object we always convert back to string (because we don't believe Date is appropriate for dateonly, and we're preparing the arrival of Temporal).

We'd prefer to receive strings for all of the above types. Being able to customize their parsing would enable us to fix this on our end.

To ride on this, would be great if tedious can provide an option to allow column returning string. The javascript precision problem is painful and even if we want to adopt an arbitrary-precision arithmetic library like BigNumber.js, it is not possible since the value retrieved from database is already losing precision.

trexshw avatar Nov 30 '22 18:11 trexshw

To simplify this, could it be possible to just have an option to treat certain SQL types as binary, and just return the raw response for that column in a buffer ? We could then parse the buffer.

A pluggable encoder/decoder system would be great but looks like a lot of work, and it looks like that project has stalled.

Also, encoding isn't as important as decoding, SQL server does a pretty good job of casting strings to the required type when needed. I'm happy with casting to strings as needed when sending data to SQL server.

If all my queries were SELECTS I'd cast the columns to binary (or string) during the select, but many result sets I deal with come from stored procedures which I don't have control over.

khkiley avatar Jan 04 '24 20:01 khkiley