piccolo icon indicating copy to clipboard operation
piccolo copied to clipboard

Add an API for upserting

Open dantownsend opened this issue 3 years ago • 19 comments

Based on this discussion: https://github.com/piccolo-orm/piccolo/discussions/248

We don't currently have an API for upserting.

This would be based around the ON CONFLICT clause you can add to INSERT queries.

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

There will probably be two APIs needed. Something for insert queries, and a convenience method for objects queries.

For example:

# For insert
Band.insert(Band(name='Pythonistas')).on_conflict(...).run_sync()

# For objects
Band.objects().upsert(Band.name == 'Pythonistas', {Band.popularity: 5000}).run_sync()

dantownsend avatar Sep 19 '21 09:09 dantownsend

I would like to work on this if It is open and unassigned.

AbhijithGanesh avatar Sep 22 '21 17:09 AbhijithGanesh

@AbhijithGanesh Great, thanks. This might be a tricky issue though, just so you're aware!

dantownsend avatar Sep 22 '21 18:09 dantownsend

Thanks for the heads up , I'll read upon docs and get this reviewed before it gets merged

AbhijithGanesh avatar Sep 22 '21 18:09 AbhijithGanesh

Clarification: The upserting method/process is like HTTP PUT right?

AbhijithGanesh avatar Sep 23 '21 06:09 AbhijithGanesh

This requires more interaction with the maintainer than I thought, where can I reach out to you? @dantownsend

AbhijithGanesh avatar Sep 23 '21 06:09 AbhijithGanesh

Based on this discussion: #248

We don't currently have an API for upserting.

This would be based around the ON CONFLICT clause you can add to INSERT queries.

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

There will probably be two APIs needed. Something for insert queries, and a convenience method for objects queries.

For example:

# For insert
Band.insert(Band(name='Pythonistas')).on_conflict(...).run_sync()

# For objects
Band.objects().upsert(Band.name == 'Pythonistas', {Band.popularity: 5000}).run_sync()

So I am implementing the insert part at query>methods>insert.py and upsert as a new file. Would that be okay?

AbhijithGanesh avatar Sep 23 '21 06:09 AbhijithGanesh

You're best reaching me here - I don't use much else besides email.. You can start a separate discussion thread if you prefer.

That's right - an UPSERT is a bit like a PUT. When an insert fails due to a unique constraint, it will update the existing row instead of raising an error.

Here are the docs:

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT https://sqlite.org/lang_conflict.html

It's a tricky one - I need to think through the API a bit, as the SQLite and Postgres implementations are a bit different.

Feel free to experiment with it while I give it some more thought, or feel free to pick up another issue in the meantime.

dantownsend avatar Sep 23 '21 06:09 dantownsend

On a high level, the insert method and upsert method would ideally be sharing similar database querystrings right(Exclude the query, just the structure)

AbhijithGanesh avatar Sep 23 '21 07:09 AbhijithGanesh

That's right, it will most likely be an on_conflict argument to Insert. If you look at the codebase, there are many 'delegates'. For example:

https://github.com/piccolo-orm/piccolo/blob/8ec9d10313c9ac44916f01e27c9ef58eb22ae0d9/piccolo/query/mixins.py#L38-L53

We would have a new one called OnConflict.

You can set the on conflict value using the constructor:

Band.insert(Band(name='Pythonistas'), on_conflict=Conflict.do_nothing).run_sync()

Or you can use a method, which sets the same underlying value on the OnConflict delegate:

Band.insert(Band(name='Pythonistas')).on_conflict(Conflict.do_nothing).run_sync()

And Conflict would be an Enum.

dantownsend avatar Sep 23 '21 07:09 dantownsend

Thanks for this vital information, Can you please assign me to this issue

AbhijithGanesh avatar Sep 23 '21 08:09 AbhijithGanesh

Can you quickly tell me about delegates quickly? I know they're a part of the mixins, can you please give me a description(I searched the code base for onconflict and couldn't find relevant references)

AbhijithGanesh avatar Sep 23 '21 08:09 AbhijithGanesh

There is no code related to on conflict yet. A delegate is just a class which stores data - if you look at mixins.py there's a bunch of them, and they're all pretty similar. The reason we use them instead of inheritance is because it's better for type annotations.

dantownsend avatar Sep 23 '21 08:09 dantownsend

got it, I shall use them

AbhijithGanesh avatar Sep 23 '21 08:09 AbhijithGanesh

Heyy, quick question, can I use exists.py to validate? I have currently based my function design on update and insert as of now,

AbhijithGanesh avatar Sep 23 '21 14:09 AbhijithGanesh

You can use exists.py in unit tests, but it should be possible to implement the on conflict changed in a single query, by extending the existing Insert class.

dantownsend avatar Sep 23 '21 14:09 dantownsend

@dantownsend I am super sorry for leaving this on open! I shall finish this ASAP!

AbhijithGanesh avatar Sep 27 '21 04:09 AbhijithGanesh

That's right, it will most likely be an on_conflict argument to Insert. If you look at the codebase, there are many 'delegates'. For example:

https://github.com/piccolo-orm/piccolo/blob/8ec9d10313c9ac44916f01e27c9ef58eb22ae0d9/piccolo/query/mixins.py#L38-L53

We would have a new one called OnConflict.

You can set the on conflict value using the constructor:

Band.insert(Band(name='Pythonistas'), on_conflict=Conflict.do_nothing).run_sync()

Or you can use a method, which sets the same underlying value on the OnConflict delegate:

Band.insert(Band(name='Pythonistas')).on_conflict(Conflict.do_nothing).run_sync()

And Conflict would be an Enum.

What slots would on_conflict take? Enum?

AbhijithGanesh avatar Sep 27 '21 04:09 AbhijithGanesh

@dantownsend can you please explain which slot on_conflict should take

AbhijithGanesh avatar Sep 29 '21 05:09 AbhijithGanesh

@AbhijithGanesh You just create a new one:

@dataclass
class Insert(Query):
    __slots__ = ("add_delegate", "on_conflict_delegate")

Slots are a minor performance enhancement which tells Python that only those attributes are allowed, which makes it slightly more memory efficient.

dantownsend avatar Sep 29 '21 06:09 dantownsend

Is anyone working on this ?

deserve-shubham avatar Feb 19 '23 13:02 deserve-shubham

@deserve-shubham Not at the moment, would be nice to add it.

dantownsend avatar Feb 19 '23 16:02 dantownsend

@deserve-shubham Please work on this if you can, I'm unable to commit to this!

AbhijithGanesh avatar Feb 25 '23 05:02 AbhijithGanesh

@dantownsend I don't know if you meant something like this, but here is my attempt for upserting api. This will also solve duplicates for manually entering rows for M2M (#707) because we will be able to set a conflict in this

async def _run(self):
    rows = self.rows
    unsaved = [i for i in rows if not i._exists_in_db]

    if unsaved:
        await rows[0].__class__.insert(
            *unsaved, on_conflict=Conflict.do_nothing <- set do_nothing in conflict as suggested in the issue
        ).run()

Another question. Should Sqlite versions less than 3.24.0 also be supported because the conflict syntax is different.

sinisaos avatar Mar 31 '23 05:03 sinisaos

@sinisaos we can detect the sqlite version and potentially change the syntax accordingly. It's accessible via SQLiteEngine. It would be good to get the on conflict clause done.

dantownsend avatar Mar 31 '23 06:03 dantownsend

@sinisaos we can detect the sqlite version and potentially change the syntax accordingly. It's accessible via SQLiteEngine. It would be good to get the on conflict clause done.

It should be something like this (but I can't try now and need to test it with some older version of Sqlite)

if (
    engine_type == "sqlite"
    and self.table._meta.db.get_version_sync() < 3.24
):
    if self.on_conflict_delegate._on_conflict is not None:
        if self.on_conflict_delegate._on_conflict.value == "DO NOTHING":
            base = f'INSERT OR IGNORE "{self.table._meta.tablename}"'
        else:
            base = f'INSERT OR REPLACE "{self.table._meta.tablename}"' 
else:
    base = f'INSERT INTO "{self.table._meta.tablename}"'

I don't know if you had time to look at the changes I made in here. If it's not good enough for PR (or it's not how you imagine conflict implementation), I wouldn't bother with it anymore and leave it to someone else.

sinisaos avatar Mar 31 '23 08:03 sinisaos