drizzle-orm
drizzle-orm copied to clipboard
Fix: json and jsonb parsing in postgres-js
Second (and better) attempt to close #724 and close #1511
This PR depends on #1659 which has already been merged to beta. After figuring out how to bypass parsers and serializers in postgres.js driver, this technique can be applied to the issue with json and jsonb.
- Added json and jsonb to the list of bypassed types on postgres.js driver
- Added simple tests to pg and postgres-js integration tests for json and jsonb columns.
There are additional tests in #1560. @AndriiSherman Please merge that one after merging this.
Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏
One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?
I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.
I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to text
rather than json
or jsonb
, and explicitly add a new column typed as json
or jsonb
?
Lemme know if my q is unclear. Thank you very much!
Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏
One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly?
I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser.
I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to
text
rather thanjson
orjsonb
, and explicitly add a new column typed asjson
orjsonb
?Lemme know if my q is unclear. Thank you very much!
That's a really good question. Before merging this one we will think this through. I guess we will write a small guide on how you can migrate this data to be a proper JSON
That'd be great, thank you.
I'm remembering that Drizzle doesn't actually know or ask about the JSON schema/type of JSON columns. The $type<T>()
function only takes a compile-time TypeScript type, not e.g. a runtime Zod schema. (Feature request btw: that would be nice!) So detecting, handling, and migrating incorrectly vs. correctly persisted JSON data will prob need to be an application-level concern, not something Drizzle can automatically encapsulate.
In our case, every JSON(B) data type we have is an array or object at the top-level, not a string. So we can easily differentiate between "incorrectly" persisted JSON today (SELECT jsonb_typeof(column)
returns "string"
) and "correctly" persisted JSON (SELECT jsonb_typeof(column)
would return "object"
or "array"
).
Given this, we can achieve an in-place (not needing to add a new column) and zero-downtime migration with the following approach, provided here if helpful for y'all or anyone else:
-
- I update Drizzle to the new version, so it'll now return incorrectly persisted JSON (JSON strings) as strings and no longer as objects or arrays.
- I keep my Drizzle column schema typed as
json
orjsonb
in my TypeScript code, but I expand the TypeScript type I pass into$type<T>()
toT | string
. This forces my app to handle both old/incorrect and new/correct cases. -
My app can call
JSON.parse()
on the values if they're old/incorrect strings (and maybe even run them through our own Zod schemas for validation) to get the correct objects or arrays. - However, Drizzle will now write JSON data correctly (as objects or arrays) on all inserts and updates going forward.
- Our DB will now start having a mix or old/incorrect and new/correct JSON, but our app will continue working bug-free. 🤞
-
- I write and manually run a script (conceptually a TypeScript DB migration) to convert all existing data from JSON strings to JSON objects or arrays.
- Old data can be detected/selected through
WHERE jsonb_typeof(column) = 'string'
. - Now our DB has only correct JSON! 🙌
-
- I update my code Drizzle schema to be just
$type<T>()
again, no longerT | string
, and remove my temporary code to handle old/incorrectly persisted JSON strings. - Now my code is back to how it was (simple), but the DB is correct now and going forward. 🎉
- I update my code Drizzle schema to be just
^ Those are just my theoretical thoughts. Feedback welcome if you think I'm missing anything. Thanks again!
@aseemk If you can afford a short downtime/maintenance on the app and have a migration script that reads all tables with jsonb columns, read the jsonb column and fix it, then you could run it as a one time operation on all environments directly after the regular drizzle-kit migrations. Then in your application code nothing has to change, as you'd update the dependencies and deploy directly after.
Yeah, you're right that brief downtime/maintenance may be reasonable.
I also realized that I think a migration might be entirely doable in SQL! There's no json_parse
function or similar in Postgres, but Postgres effectively already implements a JSON parser since it needs to support a SQL query setting a JSON(B) column to a string representation of some JSON. So since Drizzle incorrectly stringified JSON twice, and persisted a JSON string (of another JSON value), I think a simple query that "unescapes" one level of JSON stringification should work!
It appears to work on my own data that has both top-level objects and top-level arrays as JSON values (both containing nested objects and arrays). E.g. for a single JSONB column named tags
in a single table named users
:
select id
, tags::text as bad_jsonb_str -- should see wrapper quotes and backslash escapes
, jsonb_typeof(tags) as bad_jsonb_type -- should be "string"
, left(tags::text, 2) as starting_chars -- should be `"{` or `"[` for top-level objects or arrays respectively
, right(tags::text, 2) as ending_chars -- should be `}"` or `]"` for top-level objects or arrays respectively
-- Remove the starting and ending quotes:
, substring(tags::text from 2 for length(tags::text) - 2) as trimmed
-- Remove any backslash escape characters:
, regexp_replace(tags::text, '\\(.)', '\1', 'g') as unescaped
-- Combine the two above!
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g') as good_jsonb_str
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb as good_jsonb
, jsonb_typeof(
regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb
) as good_jsonb_type -- should now be object or array!
from users
limit 10
Can make this into a migration by changing that into an UPDATE ... SET ... WHERE ...
(or updating in batches with a WITH
, etc.).
Hope that helps others!
This doesn't affect pg users right?
This doesn't affect pg users right?
If you can insert and retrieve your json data without workarounds or issues, this doesn't affect you.
To clarify: we use ~pg~ (edit: nope, we use Postgres.js/postgres
; see below), and this affects us.
We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)
To clarify: we use pg, and this affects us.
We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)
Thanks for the heads-up, I've never tried accessing that data without drizzle (yet)
@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?
To clarify: we use pg, and this affects us.
We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =)
Can you clarify here? What do you mean by can't reach into the json data with SQL? What sql statement doesn't work for you on json?
@Angelelz this PR only address postgres.js. Is there another bug that affects pg then ?
I dont think Pg has an issue. All the test are passing
I see my confusion here:
- We indeed use
postgres
(Postgres.js), notpg
(node-postgres) - I mixed these two up bc 95% of our code imports from
drizzle-orm/pg-core
(postgres
anddrizzle-orm/postgres-js
are only imported for the DB connection), and I mistakenly thoughtpg-core
here was the same thing aspg
, but it's not
Sorry for the mixup! So it sounds like yes, this bug only affects us because we're using Postgres.js (postgres
). The bug exhibits the same behavior as everyone's talking about in #724 and #1511.
Thanks for clarifying!
Any update on this? Is there a workaround?
Is this fix waiting on a migration path for existing jsonb
data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:
- Release as a breaking change with config setting that causes
jsonb
to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats. - Release as a non-breaking change with a config setting that causes
jsonb
to use the correct behaviour. Independently of this PR, the defaultjsonb
behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.
If this isn't likely to land soon, would it be possible to identify a recommended workaround?
Thanks for the help!
I didn't really see a big difference between Postgres.js and node-postgres (pg
), so I switched to node-postgres which doesn't have this issue (it also has the benefit of giving better error logs). It's not really the most satisfying workaround but it is a workaround.
The only downside is that you'll have to write a data migration that translates the old escaped JSON values into the new ones yourself.
I didn't really see a big difference between Postgres.js and node-postgres (
pg
)
I tried switching to node-postgres and had major performance issues. Not sure why. Really hope this issue gets resolved soon
I also tried passing the JSON with the sql
helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]
).
sql`${[]}::jsonb` // does not work
Produces the following SQL with a syntax error...
insert into
"my_table" ("my_column")
values
(()::jsonb)
...and empty params array
// Params
[]
sql`${JSON.stringify([])}::jsonb` // also does not work
Produces the correct SQL...
insert into
"my_table" ("my_column")
values
($1::jsonb)
...but passes the JSON array as a string which gets turned into a JSON string
// Params
[
"[]", // gets turned into a JSON string i.e. "\"[]\""
]
Anyone knows a workaround for that?
Edit: I found a solution.
sql`${new Param([])}::jsonb` // wrap JSON array in new Param
Credit: https://github.com/drizzle-team/drizzle-orm/issues/724#issuecomment-1890410409
Is this fix waiting on a migration path for existing
jsonb
data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:
- Release as a breaking change with config setting that causes
jsonb
to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.- Release as a non-breaking change with a config setting that causes
jsonb
to use the correct behaviour. Independently of this PR, the defaultjsonb
behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.If this isn't likely to land soon, would it be possible to identify a recommended workaround?
Thanks for the help!
I'm looking into migrating from pg
to postgres-js
so this is important for me as well. I think https://github.com/drizzle-team/drizzle-orm/issues/1511#issuecomment-1824687669 might be a good enough workaround :
Create a custom jsonB type instead of using the drizzle provided one. @Angelelz could you confirm that this works please ?
Is this fix waiting on a migration path for existing
jsonb
data? If so, would it be possible to decouple these problems so it could be released? A couple of possibilities:
- Release as a breaking change with config setting that causes
jsonb
to use the old behaviour, and then folks can migrate their data independently of this release. Maybe risky, as there will no doubt be people who upgrade without realising they need to set the config, and end up storing both formats.- Release as a non-breaking change with a config setting that causes
jsonb
to use the correct behaviour. Independently of this PR, the defaultjsonb
behaviour could be documented as deprecated, to be replaced by the correct behaviour in some future breaking version, which would also remove the config. This option seems to create the least resistance to getting the fix available as soon as possible.If this isn't likely to land soon, would it be possible to identify a recommended workaround? Thanks for the help!
I'm looking into migrating from
pg
topostgres-js
so this is important for me as well. I think #1511 (comment) might be a good enough workaround :Create a custom jsonB type instead of using the drizzle provided one. @Angelelz could you confirm that this works please ?
it works I was using it while back can check https://github.com/drizzle-team/drizzle-orm/pull/666#issuecomment-1602918513
patch-package
form for anyone who needs it
diff --git a/node_modules/drizzle-orm/postgres-js/driver.js b/node_modules/drizzle-orm/postgres-js/driver.js
index 7e48e8c..219e0a0 100644
--- a/node_modules/drizzle-orm/postgres-js/driver.js
+++ b/node_modules/drizzle-orm/postgres-js/driver.js
@@ -12,6 +12,8 @@ function drizzle(client, config = {}) {
client.options.parsers[type] = transparentParser;
client.options.serializers[type] = transparentParser;
}
+ client.options.serializers['114'] = transparentParser;
+ client.options.serializers['3802'] = transparentParser;
const dialect = new PgDialect();
let logger;
if (config.logger === true) {
Bumping this MR. The custom type workaround works like a charm, but this issue needs to be resolved.
As for migration guide, wouldn't this query fix the issue?
UPDATE <table_name>
SET <col_1_name> = (<col_1_name> #>> '{}')::jsonb,
<col_2_name> = (<col_2_name> #>> '{}')::jsonb
...
<col_n_name> = (<col_n_name> #>> '{}')::jsonb
;
Please merge the patch if possible
@AndriiSherman Is there any update? It's been quite some time
I've been playing with an UPDATE
statement that will correct existing JSONB columns that have had the extra JSON.stringify(data)
encoding applied.
Here is what is working for our project:
UPDATE {tableName} SET {columnName} = ({columnName}->>0)::jsonb
WHERE {columnName}::text LIKE '"%"';
Running that statement will correct bad columns, but leave good columns untouched, so it is safe to run multiple times.
Caveats
This works when JSON.stringify(data)
was applied to data
that is an object or array, but does not work if data
is a primitive JSON type: string, number, boolean, or NULL.
I'm not sure how much real world application data will be using a JSONB column to store a raw JSON primitive type (that kinda defeats the purpose of JSONB), but I wanted to call attention to this limitation.
Any updates on this PR?
I also tried passing the JSON with the
sql
helper, but it doesn't work in my case because i need to insert a top level JSON Array ([]
).sql`${[]}::jsonb` // does not work
Produces the following SQL with a syntax error...
insert into "my_table" ("my_column") values (()::jsonb)
...and empty params array
// Params []
sql`${JSON.stringify([])}::jsonb` // also does not work
Produces the correct SQL...
insert into "my_table" ("my_column") values ($1::jsonb)
...but passes the JSON array as a string which gets turned into a JSON string
// Params [ "[]", // gets turned into a JSON string i.e. "\"[]\"" ]
Anyone knows a workaround for that?
Edit: I found a solution.
sql`${new Param([])}::jsonb` // wrap JSON array in new Param
Credit: #724 (comment)
I found a workaround by executing raw SQL that includes the JSON or JSONB data, pre-stringified. Here's an example:
const json_data = JSON.stringify(data);
db.execute(sql.raw(`INSERT INTO table ("json_data") VALUES ($$${json_data}$$::jsonb)`));
I'm using dollar-quoted constants instead of single or double quotes because the json_data
string might contain single quotes and will definitely contain double quotes.
For more information on dollar-quoted constants, refer to the PostgreSQL documentation: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING
Because the code above looks a bit messy, here is a cleaner example:
const json_data = JSON.stringify(data);
db.execute(
sql.raw(
'INSERT INTO table ("json_data") VALUES ($$' + json_data + "$$::jsonb)"
)
);
Fixing conflicts and merging this into the beta tag. I will prepare a guide on how to fix an existing database and will reuse and mention some of the comments from this PR
Merged. It will be available in drizzle-orm@beta
in about 30 minutes and will remain in beta
for some time (possibly a few days) before being released as the latest
version
Finally, thank you!! Which specific version does this one correspond to?