sqlite-utils icon indicating copy to clipboard operation
sqlite-utils copied to clipboard

`IndexError` when doing `.insert(..., pk='id')` after `insert_all`

Open xavdid opened this issue 1 year ago • 1 comments

I believe this is related to https://github.com/simonw/sqlite-utils/issues/98.

When pk is specified by table A's insert call, it throws an index error if a different table has written a row with a higher rowid than exists in the first table. Here's a basic example:

from sqlite_utils import Database


def test_pk_for_insert(fresh_db):
    user = {"id": "abc", "name": "david"}

    fresh_db["users"].insert(user, pk="id")

    fresh_db["comments"].insert_all(
        [
            {"id": "def", "text": "ok"},
            {"id": "ghi", "text": "great"},
        ],
    )

    fresh_db["users"].insert(
        user,
        ignore=True,
        # BUG: when specifying pk on the second insert call 
        # db.py goes into a block it doesn't expect and we get the error
        pk="id",
    )


if __name__ == "__main__":
    db = Database("bug.db")
    if db["users"].exists():
        raise ValueError(
            "bug only shows on a new database - remove bug.db before running the script"
        )
    test_pk_for_insert(db)

The error is:

  File "/Users/david/projects/reddit-to-sqlite/.venv/lib/python3.11/site-packages/sqlite_utils/db.py", line 2960, in insert_chunk
    row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

The issue is in this block:

https://github.com/simonw/sqlite-utils/blob/2747257a3334d55e890b40ec58fada57ae8cfbfd/sqlite_utils/db.py#L2954-L2958

relevant locals are:

  • pk: 'id'
  • result.lastrowid: 2

What's most interesting is the comment # self.last_rowid will be 0 if a "INSERT OR IGNORE" happened, which doesn't seem to be the case here.

xavdid avatar May 22 '23 17:05 xavdid

Oh and for context - this goes away if I use .upsert instead of insert(..., ignore=True), but I don't want to update the value if it's written, just do an insert if it's new. The code is basically:

def save_items(table, items):
    db["users"].insert(build_user(items[0]), pk="id",ignore=True)
    db[table].insert_all(items)

if comments := fetch_comments():
    save_items('comments', comments)

if posts := fetch_posts():
    save_items('posts', posts)

So either comments or post could create the relevant user if those items exist. In cases where they both exist, I get this error. I need the pk because either call could create the table.

xavdid avatar May 22 '23 17:05 xavdid