django-elasticsearch-dsl
django-elasticsearch-dsl copied to clipboard
Index produces collisions in _id when using several models in same index
I have a "unified" index for a web site, that contains both "pages" and "products".
Basically, I'm doing
my_index = Index('bla')
@my_index.doc_type
class PageSearchIndex(Document):
class Django:
model = Page
text = fields.TextField()
def prepare_text(self, object):
return 'whatever, extracted from page'
@my_index.doc_type
class ProductSearchIndex(Document):
class Django:
model = Product
text = fields.TextField()
def prepare_text(self, object):
return 'whatever, extracted from product'
This works totally fine, with the one caveat that when a product and a page have the same pk, one of them will not appear in the index.
The reason is that _prepare_action()
generates the field _id
as object_instance.pk
. This will generate duplicate _id
for different models that by chance have the same pk.
A quick fix is to add
def _prepare_action(self, object_instance, action):
u = super()._prepare_action(object_instance, action)
u['_id'] = '%s:%s' % (object_instance._meta.app_label, object_instance.pk)
return u
to the classes above, so the _id
is unique by model and object pk.
I think that should be the default, what do you think? Will that cause trouble somewhere else?
ElasticSearch recommends you to have one object type per index. So you can tweak the way you've done, but this shall not be the default. The default is to have different index for each model.
Fair enough. Thanks!