rom-sql
rom-sql copied to clipboard
Optimize PostgreSQL upsert by calling multi_upsert when possible
Current PostgreSQL upsert implementation loops through array of tuples and executes upsert sql command one by one. Are you interested in supporting multi upsert (the same way as you support multi insert) for PostgreSQL?
I'm submitting a PR with implementation proposal. Im open to improve this PR (like adding specs, docs, etc).
This is how it can be tested:
ROM::SQL.migration do
change do
create_table(:tags) do
primary_key :id
column :title, String, null: false
index :title, unique: true
end
end
end
module Relations
class Tags < ROM::Relation[:sql]
schema(:tags, infer: true)
end
end
module Commands
class UpsertTags < ROM::SQL::Commands::Postgres::Upsert
relation :tags
register_as :upsert
def self.conflict_target
:title
end
end
end
new_tags = [
{ title: 'a' },
{ title: 'b' }
]
# Before:
tags.command(:upsert).call(tags)
#=> INSERT INTO "tags" ("title") VALUES ('a') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"
#=> INSERT INTO "tags" ("title") VALUES ('b') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"
# After:
tags.command(:upsert).call(tags)
#=> INSERT INTO "tags" ("title") VALUES ('a'), ('b') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"
I agree the current approach (looping) is actually counter-intuitive. However, I could see some apps implicitly relying on the old behavior; which could break by using multi upsert. Potentially we could expose a new command multi_upsert
which allows users to opt in to the functionality. It would be nice to get this feature in because we have a few commands at AdHawk which insert a large amount of records using upsert; which should happen in a batch.
:+1: On finding the current behavior surprising. I'm currently using a custom command type with this implementation to multi_upsert
and that seems to be working fine.
If nothing else, some documentation of the current behavior and how to batch it up would be a nice addition. I'd be happy to contribute that documentation if there's a place to do so.
I'd prefer to tweak current upsert rather than adding multi_upsert
. This would be consistent with other commands.
However, I could see some apps implicitly relying on the old behavior; which could break by using multi upsert.
I'm not sure how this would break anything. This seems to be an implementation detail and performance optimization.
Is anybody interested in refactoring current Upsert
command so that it does multi_insert
transparently just like Insert
command? This can either go into 3.0.0
or 3.1.0
release. We are wrapping up 3.0.0
this week btw.
@solnic I will implement it, but I need some clarifications.
Upsert
command so that it doesmulti_insert
You meant multi_upsert
, right?
What's already implemented in this PR is that calling Upsert#execute
evaluates either upsert
or multi_upsert
. This behaviour is consistent with Create
command. Wdyt?
@solnic It is a breaking change since there are more restrictions to what is can be updated with ON CONFLICT
, i.e. if multiple rows match the same constraint it will fail.
@ianks how so? Using ON CONFLICT
requires having a unique constraint anyway. If you had it before it means you weren't able to insert conflicting records even if you were using two separate insert statements. If you add a constraint to make ON CONFLICT
possible your current conflicting inserts will fail as well, but this is issue with the constraint not with rom.
Here's an example of the breaking behavior:
-
Given only a unique constraint on
slug
-
An ON CONFLICT which attempts to update the
title
column -
With these two records, inserted in listed order:
#<Post title="1st Post" slug="first-post">
#<Post title="First Post" slug="first-post">
The current behavior would work, since records are independently inserted, this means the title would become First Post
. Using multi_insert fails since multiple records would be affected by the same constraint.
@ianks hah, you're right, thanks
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
So two options:
- Release multi upsert with breaking change in new release
- Make the functionality opt-in
Personally I favor breaking. I refactored our code to work with multi and it's not too bad (we ran into the problematical "duplicate affected rows" bug). Although an escape hatch to bring back single insert would be helpful for some imagine.
Thoughts?
@ianks I'm for breaking it in 4.0
@flash-gordon wdyt?
If so, this PR can be modified to add a deprecation message, and add the new opt-in multi_upsert method?
If so, this PR can be modified to add a deprecation message, and add the new opt-in multi_upsert method?
@ianks yes although I'm not sure how exactly the opt-in part would work, any ideas?
So we have two options:
- Change in the
upsert
method - would introduce breaking behaviour. - Deprecate the
upsert
method and introduce themulti_upsert
method (@ianks, that's what You meant, right?).
If I may suggest smth., I would go with the 1st option. It breaks the certain case, but if it does, we're still able to go with the old behavior by loopig, eg:
tags.each do |tag|
tags_relation.command(:upsert).call(tag)
end
I'm cool with either of these options.
Let's go ahead and merge option 1, with a note about it in the changelog
@ianks sounds good, would you be able to re-open this though with additional specs etc.?