Exception on bulk insert with EmbeddedDocument as primary key
I get an exception when trying to do a bulk insert on a collection where I have defined the primary key as an embedded document.
Repro:
from mongoengine import EmbeddedDocument, StringField, Document, EmbeddedDocumentField, connect
class Key(EmbeddedDocument):
property1 = StringField()
property2 = StringField()
class MainDoc(Document):
key = EmbeddedDocumentField(Key, primary_key=True)
def main():
connect(host="mongodb://root:rootpassword@localhost:27017/test_db?authSource=admin&authMechanism=SCRAM-SHA-1&tls=false")
docs = [MainDoc(key=Key(property1='aaa', property2='bbb')),
MainDoc(key=Key(property1='ccc', property2='ddd')),
MainDoc(key=Key(property1='eee', property2='fff'))]
MainDoc.objects.insert(docs)
main()
This results in the following output:
Traceback (most recent call last):
File "C:\Users\danielber\AppData\Roaming\JetBrains\PyCharm2022.1\scratches\scratch_6.py", line 26, in <module>
main()
File "C:\Users\danielber\AppData\Roaming\JetBrains\PyCharm2022.1\scratches\scratch_6.py", line 23, in main
MainDoc.objects.insert(docs)
File "C:\Users\danielber\venvs\styrax\lib\site-packages\mongoengine\queryset\base.py", line 385, in insert
documents = self.in_bulk(ids)
File "C:\Users\danielber\venvs\styrax\lib\site-packages\mongoengine\queryset\base.py", line 748, in in_bulk
doc_map[doc["_id"]] = self._document._from_son(
TypeError: unhashable type: 'dict'
Note that the data does get inserted to the collection, it looks like it's failing in verification code after the insert.
I encountered this using Python 3.9.9, mongoengine 0.24.1.
Thanks!
@cheshirex We've just hit the same problem. Have you got any solution or workaround?
No -- we've changed our schema to avoid having to do this with an embedded document. There hasn't been any followup on this, and I haven't really had any time to dig into the underlying code to try to fix it myself.
Thanks. This issue might be related to #2260
From a quick glance, it's not the same thing, but the underlying cause is indeed related.
We've found a quick workaround. The problem lies with the load_bulk parameter which by default is true, which results in an additional query after insert. This query doesn't properly handle compound IDs. To disable it, pass load_bulk=False in each insert, or monkey-patch it with something like this:
import functools
from mongoengine.queryset.base import BaseQuerySet
BaseQuerySet.insert = functools.partialmethod(BaseQuerySet.insert, load_bulk=False)
But be careful, it may potentially break some code that relies on that default.
IMHO this parameter should be False by default anyway, because this additional query isn't necessary and may harm performance.