postgrest
postgrest copied to clipboard
Support proper bulk update via PATCH
Environment
- PostgreSQL version: 11
- PostgREST version: 8.0.0
- Operating system: Linux
Description of issue
Schema:
drop table if exists example;
create table if not exists example (
pk integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
name varchar(40) NOT NULL CHECK (name <> ''),
value varchar(40) NOT NULL CHECK (value <> '')
);
insert into example (name, value) values ('one', '1');
insert into example (name, value) values ('two', '2');
insert into example (name, value) values ('three', '3');
Postgrest usage:
#!/bin/bash
set -e
data='
[
{
"pk": 1,
"name": "example"
},
{
"pk": 2,
"name": "second"
}
]
'
# returns all three
curl \
-X GET "http://localhost:9000/example"
echo -e ""
echo -e "--"
# null value in column \"value\" violates not-null constraint
curl \
-X POST "http://localhost:9000/example" \
-H "Prefer: resolution=merge-duplicates" \
-d "${data}"
echo -e ""
echo -e "--"
# null value in column \"value\" violates not-null constraint
curl \
-X POST "http://localhost:9000/example?columns=name" \
-H "Prefer: resolution=merge-duplicates" \
-d "${data}"
Bulk update on partial set of columns triggers the NOT NULL
constraint of the rest of the columns which are set as NOT NULL
. Currently we relaxed the constraints in our database to fix that, and looked at doing this via a stored procedure, however it would be nice to have an official suggestion / fix for this.
The reason is that you are not doing a bulk update at all. You are doing an INSERT
. I guess you wanted to use PUT
instead of POST
to make it a INSERT ... ON CONFLICT DO UPDATE ...
, but that would still yield the same error. The error happens at the INSERT
stage, before PostgreSQL ever reaches the UPDATE
part.
PUT
is not a proper "bulk update".
Related: #1945.
Thanks for the clarification, makes perfect sense. Is a PATCH
the HTTP verb that should be used for this? Should I close this issue and move the discussion over at the Q&A?
Is a PATCH the HTTP verb that should be used for this?
PATCH
is the proper way to do a real UPDATE
. However, it does not have bulk capabilities.
Should I close this issue and move the discussion over at the Q&A?
Since we already have a Q&A item for that topic, I'd keep this issue. It highlights why PUT
is not a full replacement for a bulk update, so we can track this feature request here.
This is currently an idea only. I can see the benefits of being able to do a bulk UPDATE
- but I don't see how the REST API interface would look like, yet.
Maybe a query could somehow look like:
UPDATE table
SET column = pgrst_body.column
FROM (SELECT * FROM json_populate_recordset(NULL::table, $1)) AS pgrst_body
WHERE table.pk_column = pgrst_body.pk_column
but I don't see how the REST API interface would look like, yet.
Actually... Do we currently support PATCH
ing with an array body? I don't think so. If not, this could be the non-breaking interface:
-
PATCH
with json object as body: CurrentUPDATE
withWHERE
determined by querystring filters only. -
PATCH
with json array of objects as body: BulkUPDATE
with filters taken from PK columns + request body (as suggested above). Tbd: What happens with querystring filters in this case? Could be either ignored or added additionally.
Actually... Do we currently support PATCHing with an array body? I don't think so.
An array is accepted, but only the first element is taken into account for the UPDATE.
There was also another option for the query with an UPDATE..FROM, mentioned on https://github.com/PostgREST/postgrest/issues/1816
(Not sure which query would be best atm)
There was also another option for the query with an UPDATE..FROM, mentioned on #1816
(Not sure which query would be best atm)
Ah, thanks for linking that other issue, too. I think the queries are basically the same, except for when the json body is parsed. The proposal in #1816 looks like this would be done in haskell, while the one here is doing it on the database side. I think we generally prefer the latter and try to avoid the former, right?
Bulk UPDATE with filters taken from PK columns
This could work as a first iteration of this feature. If a different payload column is desired as the filter, perhaps we could reuse the on_conflict
param. Or like requested on https://github.com/PostgREST/postgrest/issues/2123, perhaps the more clear on_constraint
param?
Our query already mostly works for this, it's only a matter of when to apply the payload fields as filters.
WITH
pgrst_payload AS ( SELECT '[{"id": 4, "name": "xxx"}, {"id":5, "name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE "test"."projects"."id" = x.id RETURNING "test"."projects".*;
id | name | client_id
----+------+-----------
4 | xxx | 2
5 | yyy |
Tbd: What happens with querystring filters in this case? Could be either ignored or added additionally.
We could add filters(id=gt.4
):
WITH
pgrst_payload AS ( SELECT '[{"id": 4, "name": "xxx"}, {"id":5, "name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE "test"."projects"."id" = x.id AND "test"."projects"."id" > 4 RETURNING "test"."projects".*;
id | name | client_id
----+------+-----------
5 | yyy |
But it doesn't help for knowing when to apply the payload filters. I mean we cannot always apply that condition. If we did it, it could cause confusing situations like:
-- the payload body has no id
WITH
pgrst_payload AS ( SELECT '[{"name": "xxx"}, {"name": "yyy"}]'::json AS json_data),
pgrst_body AS ( SELECT CASE WHEN json_typeof(json_data) = 'array' THEN json_data ELSE json_build_array(json_data) END AS val FROM pgrst_payload)
UPDATE "test"."projects" SET "name" = x."name" FROM (SELECT * FROM json_populate_recordset (null::"test"."projects" , (SELECT val FROM pgrst_body) )) x
WHERE "test"."projects"."id" = x.id AND "test"."projects"."id" > 4 RETURNING "test"."projects".*;
id | name | client_id
----+------+-----------
(0 rows)
-- Nothing gets updated
That would also break existing PATCH requests.
I think we should apply the payload filter whenever there are no query param filters. So:
PATCH /projects
[
{"id": 5, "name": "xxx"},
{"id": 6, "name": "yyy"}
]
Would generate the query with the id filter applied.
This would break the ability to PATCH the whole table in one request(deliberately or accidentaly), I'd argue that was always unwanted behavior(ref). So this:
PATCH /projects
{"name": "xxx"}
Will always result in 0 updates done(with no error from the db) and a 404 will be returned, same as doing an PATCH /projects?id=eq.non-existent
request.
To make that operation possible, we can enforce a limit
is present(taking advantage of https://github.com/PostgREST/postgrest/pull/2211):
PATCH /projects?limit=10&order=id
{"name": "xxx"}
Only 10 updates will happen.
Another good thing is that a full table UPDATE is an expensive operation(can take a long time on big tables) so by requiring a limit
we prevent it from happening.
Reopened, now that is #2424 is merged.
So this was reopened because of the reasons mentioned on https://github.com/PostgREST/postgrest/pull/2414#issuecomment-1212647377. It basically broke existing functionality. There's an alternate implementation with underscore/dollar operators, but that's a more complicated feature.
~~How about making this simpler and we just do bulk update based on PKs when Prefer: params=multiple-objects
(only used for RPC now) is specified?~~
~~Later on it could be extended to have Prefer: params=multiple-objects; unique_keys=id
.~~
~~This syntax on the header also goes in line with the improvements proposed on https://github.com/PostgREST/postgrest/issues/2066#issue-1070332545~~
I believe a better syntax for this is now the one posted on https://github.com/PostgREST/postgrest/issues/465#issuecomment-1566188612.
Reusing the syntax from the discussion on https://github.com/PostgREST/postgrest/issues/465#issuecomment-1722382293, I think this feature should be implemented in the following way.
Having:
GET /items?id=lte.3
[
{ "id": 1, "name": "Item 1" }
, { "id": 2, "name": "Item 2" }
, { "id": 3, "name": "Item 3" }
]
Doing:
PATCH /items?id=lte.3&name=set.$body->name
Prefer: return=representation
[
{ "id": 1, "name": "Updated Item 1" }
, { "id": 2, "name": "Updated Item 2" }
, { "id": 3, "name": "Updated Item 3" }
]
Will result on:
[
{ "id": 1, "name": "Updated Item 1" }
, { "id": 2, "name": "Updated Item 2" }
, { "id": 3, "name": "Updated Item 3" }
]
After this, #465 should be easier to implement too.
@steve-chavez @laurenceisla
Now that #2926 is merged, POST
returns 200
on an update. So, maybe this can be done with POST
. e.g
POST /items
Prefer: return=representation, resolution=merge-duplicates
[
{ "id": 1, "name": "Updated Item 1" }
, { "id": 2, "name": "Updated Item 2" }
, { "id": 3, "name": "Updated Item 3" }
]
HTTP/1.1 200 OK
[
{ "id": 1, "name": "Updated Item 1" }
, { "id": 2, "name": "Updated Item 2" }
, { "id": 3, "name": "Updated Item 3" }
]