database-js icon indicating copy to clipboard operation
database-js copied to clipboard

UUIDs Get Returned Incorrectly

Open RedSpid3r opened this issue 1 year ago • 1 comments

I have a column (id_new) on a dev branch to test using UUID7 for PKs. When doing a SELECT query the UUIDs seem to be being returned incorrectly.

Example Table

Column Name	Datatype		Collation		PK
id		INT			utf8mb4_0900_ai_ci	Yes
id_new		CHAR(36)		utf8mb4_bin
domain		VARCHAR(255)		utf8mb4_0900_ai_ci

I updated the column for this record like so:

import { uuidv7 } from "uuidv7";
const domainId = uuidv7();

await c.env.CONNECTION.transaction(async tx => {
	await tx.execute("UPDATE domains SET id_new = ? WHERE domain = ?", [domainId, 'domain.com')
})

When viewed via MySQL Workbench app, id_new shows as 018e1630-ca36-7188-94f9-7120cb925faf

However, when running a SELECT query and returning the results it shows as the following:

"rows": [
            {
                "HEX(id_new)": "30313865313633302D636133362D373138382D393466392D373132306362393235666166"
            }
        ]

My SELECT statement

SELECT HEX(id_new) FROM domains WHERE domain = 'domain.com'

RedSpid3r avatar Mar 07 '24 00:03 RedSpid3r

Just to add here, I suspect in this case, it is related to the collation utf8mb4_bin, but want to see what the API response is for this type. Something is definitely wrong here.

mattrobenolt avatar Mar 07 '24 00:03 mattrobenolt

So looking into it more, the utf8mb4_bin collation itself causes MySQL to interpret this column as binary and not utf8mb4 anymore, regardless of the actual charset. And because we see this as BLOB and BINARY, we treat it as raw binary data since... that's what those represent.

I think we need some extra logic here to work around this.

Given the table:

CREATE TABLE `dumb` (
  `a` text,
  `b` char(10),
  `c` text COLLATE utf8mb4_bin,
  `d` char(10) COLLATE utf8mb4_bin
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

And inserting one row:

insert into dumb values ('hello-a', 'hello-b', 'hello-c', 'hello-d');

We get the results back:

{
  "result": {
    "fields": [
      {
        "name": "a",
        "type": "TEXT",
        "table": "dumb",
        "orgTable": "dumb",
        "database": "mattdb",
        "orgName": "a",
        "columnLength": 262140,
        "charset": 255,
        "flags": 16
      },
      {
        "name": "b",
        "type": "CHAR",
        "table": "dumb",
        "orgTable": "dumb",
        "database": "mattdb",
        "orgName": "b",
        "columnLength": 40,
        "charset": 255
      },
      {
        "name": "c",
        "type": "BLOB",
        "table": "dumb",
        "orgTable": "dumb",
        "database": "mattdb",
        "orgName": "c",
        "columnLength": 262140,
        "charset": 255,
        "flags": 144
      },
      {
        "name": "d",
        "type": "BINARY",
        "table": "dumb",
        "orgTable": "dumb",
        "database": "mattdb",
        "orgName": "d",
        "columnLength": 40,
        "charset": 255,
        "flags": 128
      }
    ],
    "rows": [
      {
        "lengths": [
          "7",
          "7",
          "7",
          "7"
        ],
        "values": "aGVsbG8tYWhlbGxvLWJoZWxsby1jaGVsbG8tZA=="
      }
    ]
  }
}

The first two are TEXT/CHAR accordingly, which is correct, while the last 2 became BLOB/BINARY.

This is fundamentally why we were handling these "wrong". I think we need to more specifically infer from the charset and flags.

mattrobenolt avatar Apr 24 '24 21:04 mattrobenolt