mongoengine icon indicating copy to clipboard operation
mongoengine copied to clipboard

Exception on bulk insert with EmbeddedDocument as primary key

Open cheshirex opened this issue 3 years ago • 5 comments

I get an exception when trying to do a bulk insert on a collection where I have defined the primary key as an embedded document.

Repro:

from mongoengine import EmbeddedDocument, StringField, Document, EmbeddedDocumentField, connect

class Key(EmbeddedDocument):
    property1 = StringField()
    property2 = StringField()

class MainDoc(Document):
    key = EmbeddedDocumentField(Key, primary_key=True)

def main():
    connect(host="mongodb://root:rootpassword@localhost:27017/test_db?authSource=admin&authMechanism=SCRAM-SHA-1&tls=false")

    docs = [MainDoc(key=Key(property1='aaa', property2='bbb')),
            MainDoc(key=Key(property1='ccc', property2='ddd')),
            MainDoc(key=Key(property1='eee', property2='fff'))]

    MainDoc.objects.insert(docs)

main()

This results in the following output:

Traceback (most recent call last):
  File "C:\Users\danielber\AppData\Roaming\JetBrains\PyCharm2022.1\scratches\scratch_6.py", line 26, in <module>
    main()
  File "C:\Users\danielber\AppData\Roaming\JetBrains\PyCharm2022.1\scratches\scratch_6.py", line 23, in main
    MainDoc.objects.insert(docs)
  File "C:\Users\danielber\venvs\styrax\lib\site-packages\mongoengine\queryset\base.py", line 385, in insert
    documents = self.in_bulk(ids)
  File "C:\Users\danielber\venvs\styrax\lib\site-packages\mongoengine\queryset\base.py", line 748, in in_bulk
    doc_map[doc["_id"]] = self._document._from_son(
TypeError: unhashable type: 'dict'

Note that the data does get inserted to the collection, it looks like it's failing in verification code after the insert.

I encountered this using Python 3.9.9, mongoengine 0.24.1.

Thanks!

cheshirex avatar May 12 '22 06:05 cheshirex

@cheshirex We've just hit the same problem. Have you got any solution or workaround?

adampl avatar Dec 14 '22 10:12 adampl

No -- we've changed our schema to avoid having to do this with an embedded document. There hasn't been any followup on this, and I haven't really had any time to dig into the underlying code to try to fix it myself.

cheshirex avatar Dec 15 '22 07:12 cheshirex

Thanks. This issue might be related to #2260

adampl avatar Dec 15 '22 11:12 adampl

From a quick glance, it's not the same thing, but the underlying cause is indeed related.

cheshirex avatar Dec 15 '22 13:12 cheshirex

We've found a quick workaround. The problem lies with the load_bulk parameter which by default is true, which results in an additional query after insert. This query doesn't properly handle compound IDs. To disable it, pass load_bulk=False in each insert, or monkey-patch it with something like this:

import functools
from mongoengine.queryset.base import BaseQuerySet

BaseQuerySet.insert = functools.partialmethod(BaseQuerySet.insert, load_bulk=False)

But be careful, it may potentially break some code that relies on that default.

IMHO this parameter should be False by default anyway, because this additional query isn't necessary and may harm performance.

adampl avatar Jan 04 '23 12:01 adampl