DBIish icon indicating copy to clipboard operation
DBIish copied to clipboard

MySQL driver cannot handle bigint unsigned in `execute( )`

Open bbkr opened this issue 2 years ago • 1 comments

CREATE TABLE foo ( id bigint unsigned ) engine=InnoDB;
INSERT INTO foo ( id ) VALUES ( 18446744073709551615 );    -- 2^64 -1

This value can be SELECTed without any issues:

my $id = $connection.execute( 'SELECT id FROM foo' ).allrows( )[ 0 ][ 0 ];
say $id;

will print 18446744073709551615

But somehow cannot be used as execute() argument:

$connection.execute( 'SELECT id FROM foo WHERE id = ?', $id );
Cannot unbox 64 bit wide bigint into native integer
  in block  at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 119
  in block  at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 114
  in method execute at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 111
  in method execute at /home/bbkr/.raku/sources/B092781CCB9B75F51B0389A686A61C391F70F118 (DBDish::mysql::Connection) line 57

bbkr avatar Aug 12 '22 22:08 bbkr

https://github.com/raku-community-modules/DBIish/blob/main/lib/DBDish/mysql/StatementHandle.rakumod#L119

This logic is not valid, int64 has one bit less and cannot store maximum unsigned value.

I'm not sure what is the best fix without performance penalty:

when Int  {
    $st = MYSQL_TYPE_LONGLONG;
    $_ > 0 ?? Blob[uint64].new($_) !! Blob[int64].new($_);
}

?

bbkr avatar Aug 12 '22 22:08 bbkr

I think this needs proper solution. Current one is doing wrong typization for HALF of available range based on very risky assumption.

Yes, most autoincremented IDs are small. But bigints are also used as compact way to store binary flags for example. Or geo data. Driver should not manipulate data type like that. This is error prone and may create weird side effects.

bbkr avatar Nov 24 '22 11:11 bbkr

It's impossible to fix properly as MySQL doesn't have any way of handling numbers over 2**64 other than as a string (that I've found), so string needs to be the final fallback.

At best, we can improve 2^63 to 2^64.

So you prefer this coding?

if $_ > -2**63 and $_ < 2**63 {
    $st = MYSQL_TYPE_LONGLONG;
    Blob[int64].new($_);
} elsif $_ > 0 and $_ < 2**64 - 1 {
    $st = MYSQL_TYPE_LONGLONG;
    Blob[uint64].new($_);
} else {
    .Str.encode;
}

Also, can you provide a test case where it shows a change? If this has a semantic impact, we should have a test to ensure the whole BIGINT range has similar semantics.

The round-trip test I added to t/24-mysql-large-value.t doesn't detect any difference between a string encoded number and a LONGLONG encoded number.

rbt avatar Nov 24 '22 11:11 rbt

The issue refers to biggest MySQL numeric type that MySQL can handle. IMO this else part is not test-coverable (or is it?) and simple if >0 then use uint64 else use int64 should be enough (and faster).

The functional difference is on data consumer side. One expects numeric type from numeric column. This value may get propagated for example incorrectly as string to JSON and crash some strong typed consumer of such payload. Round trip MySQL->MySQL does not show whole story.

In my particular case https://github.com/bbkr/UpRooted/blob/master/t/31-reader-mysql.rakutest#L42 - this hit me here when I was checking how dumped data is presented for consumption and was unable to truly test whole field range.

bbkr avatar Nov 24 '22 13:11 bbkr

The issue refers to biggest MySQL numeric type that MySQL can handle. IMO this else part is not test-coverable (or is it?) and simple if >0 then use uint64 else use int64 should be enough (and faster).

Unfortunately this is not adequate. Raku Ints such as 2^80 do not fit into either uint64 or MYSQL_TYPE_LONGLONG, and need to be transmitted as a string. The MySQL numeric type can store much larger numbers than BigInt, and even if they couldn't this generating a suitable error message (not a complaint about unboxing) requires more logic.

The functional difference is on data consumer side. One expects numeric type from numeric column.

Agreed, and everything in the test I added all values are returned as a Rat as expected from a Numeric column.

If you can identify, and we code around a semantic difference, we need to add a DBIish test for it to ensure it doesn't change in the future.

rbt avatar Nov 24 '22 13:11 rbt

In my particular case https://github.com/bbkr/UpRooted/blob/master/t/31-reader-mysql.rakutest#L42 - this hit me here when I was checking how dumped data is presented for consumption and was unable to truly test whole field range.

This appears to be a store and read round-trip? You'll get back an Int as the Raku datatype from a MySQL bigint column regardless of whether it's transmitted across the protocol as a fixed-length 8 bytes or as a Unicode encoded string.

rbt avatar Nov 24 '22 14:11 rbt

OK, then the fastest way seems to be (also with corrected range edges):

if -2**63 <= $_ <= 2**63  -1 {      # most stuff goes here
    $st = MYSQL_TYPE_LONGLONG;
    Blob[int64].new($_);
}
elsif 2**63 <= $_ <= 2**64 - 1 {     # half of unsigned int range goes here and does not need costly String conversion
    $st = MYSQL_TYPE_LONGLONG;
    Blob[uint64].new($_);
} else {    # super rare cases in real world use
    .Str.encode;
}

That is mix of your and mine solution that handles most common cases cheaply, does not have string casting penalty for unsigned int top range and has fallback for overflows. Am I correct?

bbkr avatar Nov 24 '22 14:11 bbkr

BTW: My project does round trip when writing data here: https://github.com/bbkr/UpRooted/blob/master/t/42-writer-mysql.rakutest#L28

So it is currently tested "up to int signed". I think your solution alone will work just fine, however with slight performance penalty mentioned in my previous post.

bbkr avatar Nov 24 '22 14:11 bbkr

The code you suggest seems reasonable to me but it doesn't quite work.

In this branch: https://github.com/raku-community-modules/DBIish/commit/ed66c7e48651f2b18bab04bdbfd6a37baa2edb4f

We get this test result: https://github.com/raku-community-modules/DBIish/actions/runs/3541415105/jobs/5945651204#step:12:201

These values:

9223372036854775808
18446744073709551615

Come back as these values:

-1
-9223372036854775808

rbt avatar Nov 24 '22 14:11 rbt

Weird, on logical level this fix should be correct (Blobs do not overflow). Haven't looked in the test itself yet (I'm still in my $dayjob).

bbkr avatar Nov 24 '22 14:11 bbkr

Are you sure MYSQL_TYPE_LONGLONG handles those values? I'm reading it as an int64_t .

rbt avatar Nov 24 '22 14:11 rbt

You're correct - it does not: https://dev.mysql.com/doc/c-api/8.0/en/c-api-prepared-statement-type-codes.html

bbkr avatar Nov 24 '22 14:11 bbkr

So my fix proposal were based on wrong assumption, sorry about that. Looks like your version is the valid way to solve it and there will always be string casting performance penalty in upper bigint unsigned range.

bbkr avatar Nov 24 '22 14:11 bbkr

No worries. I'll merge in the lower boundary issue you identified as soon as the tests pass.

rbt avatar Nov 24 '22 15:11 rbt