drizzle-orm icon indicating copy to clipboard operation
drizzle-orm copied to clipboard

Fix: json and jsonb parsing in postgres-js

Open Angelelz opened this issue 1 year ago • 22 comments

Second (and better) attempt to close #724 and close #1511

This PR depends on #1659 which has already been merged to beta. After figuring out how to bypass parsers and serializers in postgres.js driver, this technique can be applied to the issue with json and jsonb.

  • Added json and jsonb to the list of bypassed types on postgres.js driver
  • Added simple tests to pg and postgres-js integration tests for json and jsonb columns.

There are additional tests in #1560. @AndriiSherman Please merge that one after merging this.

Angelelz avatar Jan 11 '24 02:01 Angelelz

Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏

One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?

I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.

I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to text rather than json or jsonb, and explicitly add a new column typed as json or jsonb?

Lemme know if my q is unclear. Thank you very much!

aseemk avatar Jan 11 '24 18:01 aseemk

Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏

One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?

I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.

I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to text rather than json or jsonb, and explicitly add a new column typed as json or jsonb?

Lemme know if my q is unclear. Thank you very much!

That's a really good question. Before merging this one we will think this through. I guess we will write a small guide on how you can migrate this data to be a proper JSON

AndriiSherman avatar Jan 11 '24 18:01 AndriiSherman

That'd be great, thank you.

I'm remembering that Drizzle doesn't actually know or ask about the JSON schema/type of JSON columns. The $type<T>() function only takes a compile-time TypeScript type, not e.g. a runtime Zod schema. (Feature request btw: that would be nice!) So detecting, handling, and migrating incorrectly vs. correctly persisted JSON data will prob need to be an application-level concern, not something Drizzle can automatically encapsulate.

In our case, every JSON(B) data type we have is an array or object at the top-level, not a string. So we can easily differentiate between "incorrectly" persisted JSON today (SELECT jsonb_typeof(column) returns "string") and "correctly" persisted JSON (SELECT jsonb_typeof(column) would return "object" or "array").

Given this, we can achieve an in-place (not needing to add a new column) and zero-downtime migration with the following approach, provided here if helpful for y'all or anyone else:

    • I update Drizzle to the new version, so it'll now return incorrectly persisted JSON (JSON strings) as strings and no longer as objects or arrays.
    • I keep my Drizzle column schema typed as json or jsonb in my TypeScript code, but I expand the TypeScript type I pass into $type<T>() to T | string. This forces my app to handle both old/incorrect and new/correct cases.
    • My app can call JSON.parse() on the values if they're old/incorrect strings (and maybe even run them through our own Zod schemas for validation) to get the correct objects or arrays.
    • However, Drizzle will now write JSON data correctly (as objects or arrays) on all inserts and updates going forward.
    • Our DB will now start having a mix or old/incorrect and new/correct JSON, but our app will continue working bug-free. 🤞
    • I write and manually run a script (conceptually a TypeScript DB migration) to convert all existing data from JSON strings to JSON objects or arrays.
    • Old data can be detected/selected through WHERE jsonb_typeof(column) = 'string'.
    • Now our DB has only correct JSON! 🙌
    • I update my code Drizzle schema to be just $type<T>() again, no longer T | string, and remove my temporary code to handle old/incorrectly persisted JSON strings.
    • Now my code is back to how it was (simple), but the DB is correct now and going forward. 🎉

^ Those are just my theoretical thoughts. Feedback welcome if you think I'm missing anything. Thanks again!

aseemk avatar Jan 11 '24 18:01 aseemk

@aseemk If you can afford a short downtime/maintenance on the app and have a migration script that reads all tables with jsonb columns, read the jsonb column and fix it, then you could run it as a one time operation on all environments directly after the regular drizzle-kit migrations. Then in your application code nothing has to change, as you'd update the dependencies and deploy directly after.

Hebilicious avatar Jan 14 '24 20:01 Hebilicious

Yeah, you're right that brief downtime/maintenance may be reasonable.

I also realized that I think a migration might be entirely doable in SQL! There's no json_parse function or similar in Postgres, but Postgres effectively already implements a JSON parser since it needs to support a SQL query setting a JSON(B) column to a string representation of some JSON. So since Drizzle incorrectly stringified JSON twice, and persisted a JSON string (of another JSON value), I think a simple query that "unescapes" one level of JSON stringification should work!

It appears to work on my own data that has both top-level objects and top-level arrays as JSON values (both containing nested objects and arrays). E.g. for a single JSONB column named tags in a single table named users:

select id

, tags::text as bad_jsonb_str -- should see wrapper quotes and backslash escapes
, jsonb_typeof(tags) as bad_jsonb_type -- should be "string"

, left(tags::text, 2) as starting_chars -- should be `"{` or `"[` for top-level objects or arrays respectively
, right(tags::text, 2) as ending_chars  -- should be `}"` or `]"` for top-level objects or arrays respectively

-- Remove the starting and ending quotes:
, substring(tags::text from 2 for length(tags::text) - 2) as trimmed

-- Remove any backslash escape characters:
, regexp_replace(tags::text, '\\(.)', '\1', 'g') as unescaped

-- Combine the two above!
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g') as good_jsonb_str
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb as good_jsonb
, jsonb_typeof(
  regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb
) as good_jsonb_type -- should now be object or array!

from users
limit 10

Can make this into a migration by changing that into an UPDATE ... SET ... WHERE ... (or updating in batches with a WITH, etc.).

Hope that helps others!

aseemk avatar Jan 14 '24 21:01 aseemk

This doesn't affect pg users right?

Hebilicious avatar Jan 14 '24 22:01 Hebilicious

This doesn't affect pg users right?

If you can insert and retrieve your json data without workarounds or issues, this doesn't affect you.

Angelelz avatar Jan 15 '24 00:01 Angelelz

To clarify: we use ~pg~ (edit: nope, we use Postgres.js/postgres; see below), and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

aseemk avatar Jan 15 '24 02:01 aseemk

To clarify: we use pg, and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

Thanks for the heads-up, I've never tried accessing that data without drizzle (yet)

@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?

Hebilicious avatar Jan 16 '24 23:01 Hebilicious

To clarify: we use pg, and this affects us.

We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)

Can you clarify here? What do you mean by can't reach into the json data with SQL? What sql statement doesn't work for you on json?

Angelelz avatar Jan 17 '24 00:01 Angelelz

@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?

I dont think Pg has an issue. All the test are passing

Angelelz avatar Jan 17 '24 00:01 Angelelz

I see my confusion here:

  • We indeed use postgres (Postgres.js), not pg (node-postgres)
  • I mixed these two up bc 95% of our code imports from drizzle-orm/pg-core (postgres and drizzle-orm/postgres-js are only imported for the DB connection), and I mistakenly thought pg-core here was the same thing as pg, but it's not

Sorry for the mixup! So it sounds like yes, this bug only affects us because we're using Postgres.js (postgres). The bug exhibits the same behavior as everyone's talking about in #724 and #1511.

Thanks for clarifying!

aseemk avatar Jan 17 '24 00:01 aseemk

Any update on this? Is there a workaround?

juliomuhlbauer avatar Feb 05 '24 22:02 juliomuhlbauer

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround?

Thanks for the help!

hawkett avatar Feb 16 '24 03:02 hawkett

I didn't really see a big difference between Postgres.js and node-postgres (pg), so I switched to node-postgres which doesn't have this issue (it also has the benefit of giving better error logs). It's not really the most satisfying workaround but it is a workaround.

The only downside is that you'll have to write a data migration that translates the old escaped JSON values into the new ones yourself.

Hansenq avatar Feb 16 '24 03:02 Hansenq

I didn't really see a big difference between Postgres.js and node-postgres (pg)

I tried switching to node-postgres and had major performance issues. Not sure why. Really hope this issue gets resolved soon

niko-gardenanet avatar Feb 23 '24 15:02 niko-gardenanet

I also tried passing the JSON with the sql helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]).

sql`${[]}::jsonb` // does not work

Produces the following SQL with a syntax error...

insert into
  "my_table" ("my_column")
values
  (()::jsonb)

...and empty params array

// Params
[]
sql`${JSON.stringify([])}::jsonb` // also does not work

Produces the correct SQL...

insert into
  "my_table" ("my_column")
values
  ($1::jsonb)

...but passes the JSON array as a string which gets turned into a JSON string

// Params
[
  "[]", // gets turned into a JSON string i.e. "\"[]\""
]

Anyone knows a workaround for that?

Edit: I found a solution.

sql`${new Param([])}::jsonb` // wrap JSON array in new Param

Credit: https://github.com/drizzle-team/drizzle-orm/issues/724#issuecomment-1890410409

niko-gardenanet avatar Feb 23 '24 16:02 niko-gardenanet

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround?

Thanks for the help!

I'm looking into migrating from pg to postgres-js so this is important for me as well. I think https://github.com/drizzle-team/drizzle-orm/issues/1511#issuecomment-1824687669 might be a good enough workaround :

Create a custom jsonB type instead of using the drizzle provided one. @Angelelz could you confirm that this works please ?

Hebilicious avatar Feb 27 '24 15:02 Hebilicious

Is this fix waiting on a migration path for existing jsonb data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:

  1. Release as a breaking change with config setting that causes jsonb to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.
  2. Release as a non-breaking change with a config setting that causes jsonb to use the correct behaviour. Independently of this PR, the default jsonb behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.

If this isn't likely to land soon, would it be possible to identify a recommended workaround? Thanks for the help!

I'm looking into migrating from pg to postgres-js so this is important for me as well. I think #1511 (comment) might be a good enough workaround :

Create a custom jsonB type instead of using the drizzle provided one. @Angelelz could you confirm that this works please ?

it works I was using it while back can check https://github.com/drizzle-team/drizzle-orm/pull/666#issuecomment-1602918513

LeonAlvarez avatar Apr 03 '24 00:04 LeonAlvarez

patch-package form for anyone who needs it

diff --git a/node_modules/drizzle-orm/postgres-js/driver.js b/node_modules/drizzle-orm/postgres-js/driver.js
index 7e48e8c..219e0a0 100644
--- a/node_modules/drizzle-orm/postgres-js/driver.js
+++ b/node_modules/drizzle-orm/postgres-js/driver.js
@@ -12,6 +12,8 @@ function drizzle(client, config = {}) {
     client.options.parsers[type] = transparentParser;
     client.options.serializers[type] = transparentParser;
   }
+  client.options.serializers['114'] = transparentParser;
+  client.options.serializers['3802'] = transparentParser;
   const dialect = new PgDialect();
   let logger;
   if (config.logger === true) {

arjunyel avatar Apr 11 '24 00:04 arjunyel

Bumping this MR. The custom type workaround works like a charm, but this issue needs to be resolved.

SiNONiMiTY avatar May 13 '24 07:05 SiNONiMiTY

As for migration guide, wouldn't this query fix the issue?

UPDATE <table_name>
SET <col_1_name> = (<col_1_name> #>> '{}')::jsonb,
    <col_2_name> = (<col_2_name> #>> '{}')::jsonb
   ...
    <col_n_name> = (<col_n_name> #>> '{}')::jsonb

;

alin-plamadeala avatar May 27 '24 10:05 alin-plamadeala

Please merge the patch if possible

deepsingh132 avatar Jun 14 '24 10:06 deepsingh132

@AndriiSherman Is there any update? It's been quite some time

eddienubes avatar Jul 04 '24 20:07 eddienubes

I've been playing with an UPDATE statement that will correct existing JSONB columns that have had the extra JSON.stringify(data) encoding applied.

Here is what is working for our project:

UPDATE {tableName} SET {columnName} = ({columnName}->>0)::jsonb
WHERE {columnName}::text LIKE '"%"';

Running that statement will correct bad columns, but leave good columns untouched, so it is safe to run multiple times.

Caveats

This works when JSON.stringify(data) was applied to data that is an object or array, but does not work if data is a primitive JSON type: string, number, boolean, or NULL.

I'm not sure how much real world application data will be using a JSONB column to store a raw JSON primitive type (that kinda defeats the purpose of JSONB), but I wanted to call attention to this limitation.

DougSchmidt-Leeward avatar Jul 19 '24 15:07 DougSchmidt-Leeward

Any updates on this PR?

andreial avatar Jul 22 '24 09:07 andreial

I also tried passing the JSON with the sql helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]).

sql`${[]}::jsonb` // does not work

Produces the following SQL with a syntax error...

insert into
  "my_table" ("my_column")
values
  (()::jsonb)

...and empty params array

// Params
[]
sql`${JSON.stringify([])}::jsonb` // also does not work

Produces the correct SQL...

insert into
  "my_table" ("my_column")
values
  ($1::jsonb)

...but passes the JSON array as a string which gets turned into a JSON string

// Params
[
  "[]", // gets turned into a JSON string i.e. "\"[]\""
]

Anyone knows a workaround for that?

Edit: I found a solution.

sql`${new Param([])}::jsonb` // wrap JSON array in new Param

Credit: #724 (comment)

I found a workaround by executing raw SQL that includes the JSON or JSONB data, pre-stringified. Here's an example:

const json_data = JSON.stringify(data);

db.execute(sql.raw(`INSERT INTO table ("json_data") VALUES ($$${json_data}$$::jsonb)`));

I'm using dollar-quoted constants instead of single or double quotes because the json_data string might contain single quotes and will definitely contain double quotes.

For more information on dollar-quoted constants, refer to the PostgreSQL documentation: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

Because the code above looks a bit messy, here is a cleaner example:

const json_data = JSON.stringify(data);

db.execute(
  sql.raw(
    'INSERT INTO table ("json_data") VALUES ($$' + json_data + "$$::jsonb)"
  )
);

nikosgram avatar Aug 04 '24 14:08 nikosgram

Fixing conflicts and merging this into the beta tag. I will prepare a guide on how to fix an existing database and will reuse and mention some of the comments from this PR

AndriiSherman avatar Aug 06 '24 11:08 AndriiSherman

Merged. It will be available in drizzle-orm@beta in about 30 minutes and will remain in beta for some time (possibly a few days) before being released as the latest version

AndriiSherman avatar Aug 06 '24 12:08 AndriiSherman

Finally, thank you!! Which specific version does this one correspond to?

tonyxiao avatar Aug 07 '24 00:08 tonyxiao