rom-sql icon indicating copy to clipboard operation
rom-sql copied to clipboard

Optimize PostgreSQL upsert by calling multi_upsert when possible

Open maicher opened this issue 6 years ago • 17 comments

Current PostgreSQL upsert implementation loops through array of tuples and executes upsert sql command one by one. Are you interested in supporting multi upsert (the same way as you support multi insert) for PostgreSQL?

I'm submitting a PR with implementation proposal. Im open to improve this PR (like adding specs, docs, etc).

This is how it can be tested:

ROM::SQL.migration do
  change do
    create_table(:tags) do
      primary_key :id
      column :title, String, null: false
      index :title, unique: true
    end
  end
end

module Relations
  class Tags < ROM::Relation[:sql]
    schema(:tags, infer: true)
  end
end

module Commands
  class UpsertTags < ROM::SQL::Commands::Postgres::Upsert
    relation :tags
    register_as :upsert

    def self.conflict_target
      :title
    end
  end
end

new_tags = [
  { title: 'a' },
  { title: 'b' }
]

# Before:
tags.command(:upsert).call(tags)
#=> INSERT INTO "tags" ("title") VALUES ('a') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"
#=> INSERT INTO "tags" ("title") VALUES ('b') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"

# After:
tags.command(:upsert).call(tags)
#=> INSERT INTO "tags" ("title") VALUES ('a'), ('b') ON CONFLICT ("title") DO NOTHING RETURNING "tags"."id", "tags"."title"

maicher avatar Jan 22 '18 16:01 maicher

I agree the current approach (looping) is actually counter-intuitive. However, I could see some apps implicitly relying on the old behavior; which could break by using multi upsert. Potentially we could expose a new command multi_upsert which allows users to opt in to the functionality. It would be nice to get this feature in because we have a few commands at AdHawk which insert a large amount of records using upsert; which should happen in a batch.

ianks avatar Sep 28 '18 01:09 ianks

:+1: On finding the current behavior surprising. I'm currently using a custom command type with this implementation to multi_upsert and that seems to be working fine.

If nothing else, some documentation of the current behavior and how to batch it up would be a nice addition. I'd be happy to contribute that documentation if there's a place to do so.

jamesdabbs avatar Mar 02 '19 17:03 jamesdabbs

I'd prefer to tweak current upsert rather than adding multi_upsert. This would be consistent with other commands.

However, I could see some apps implicitly relying on the old behavior; which could break by using multi upsert.

I'm not sure how this would break anything. This seems to be an implementation detail and performance optimization.

solnic avatar Apr 20 '19 10:04 solnic

Is anybody interested in refactoring current Upsert command so that it does multi_insert transparently just like Insert command? This can either go into 3.0.0 or 3.1.0 release. We are wrapping up 3.0.0 this week btw.

solnic avatar Apr 22 '19 16:04 solnic

@solnic I will implement it, but I need some clarifications.

Upsert command so that it does multi_insert

You meant multi_upsert, right?

What's already implemented in this PR is that calling Upsert#execute evaluates either upsert or multi_upsert. This behaviour is consistent with Create command. Wdyt?

maicher avatar Apr 27 '19 07:04 maicher

@solnic It is a breaking change since there are more restrictions to what is can be updated with ON CONFLICT, i.e. if multiple rows match the same constraint it will fail.

ianks avatar May 23 '19 13:05 ianks

@ianks how so? Using ON CONFLICT requires having a unique constraint anyway. If you had it before it means you weren't able to insert conflicting records even if you were using two separate insert statements. If you add a constraint to make ON CONFLICT possible your current conflicting inserts will fail as well, but this is issue with the constraint not with rom.

flash-gordon avatar May 25 '19 11:05 flash-gordon

Here's an example of the breaking behavior:

  1. Given only a unique constraint on slug

  2. An ON CONFLICT which attempts to update the title column

  3. With these two records, inserted in listed order:

#<Post title="1st Post" slug="first-post">
#<Post title="First Post" slug="first-post">

The current behavior would work, since records are independently inserted, this means the title would become First Post . Using multi_insert fails since multiple records would be affected by the same constraint.

ianks avatar May 25 '19 20:05 ianks

@ianks hah, you're right, thanks

ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

flash-gordon avatar May 25 '19 20:05 flash-gordon

So two options:

  1. Release multi upsert with breaking change in new release
  2. Make the functionality opt-in

Personally I favor breaking. I refactored our code to work with multi and it's not too bad (we ran into the problematical "duplicate affected rows" bug). Although an escape hatch to bring back single insert would be helpful for some imagine.

Thoughts?

ianks avatar Sep 04 '19 03:09 ianks

@ianks I'm for breaking it in 4.0

@flash-gordon wdyt?

solnic avatar Sep 28 '19 10:09 solnic

If so, this PR can be modified to add a deprecation message, and add the new opt-in multi_upsert method?

ianks avatar Sep 28 '19 23:09 ianks

If so, this PR can be modified to add a deprecation message, and add the new opt-in multi_upsert method?

@ianks yes although I'm not sure how exactly the opt-in part would work, any ideas?

solnic avatar Oct 01 '19 07:10 solnic

So we have two options:

  1. Change in the upsert method - would introduce breaking behaviour.
  2. Deprecate the upsert method and introduce the multi_upsert method (@ianks, that's what You meant, right?).

If I may suggest smth., I would go with the 1st option. It breaks the certain case, but if it does, we're still able to go with the old behavior by loopig, eg:

tags.each do |tag|
  tags_relation.command(:upsert).call(tag)
end

maicher avatar Oct 05 '19 17:10 maicher

I'm cool with either of these options.

ianks avatar Jul 24 '20 18:07 ianks

Let's go ahead and merge option 1, with a note about it in the changelog

ianks avatar Nov 05 '20 17:11 ianks

@ianks sounds good, would you be able to re-open this though with additional specs etc.?

solnic avatar Nov 11 '20 12:11 solnic