django-elasticsearch-dsl icon indicating copy to clipboard operation
django-elasticsearch-dsl copied to clipboard

Index produces collisions in _id when using several models in same index

Open mjl opened this issue 3 years ago • 2 comments

I have a "unified" index for a web site, that contains both "pages" and "products".

Basically, I'm doing

my_index = Index('bla')

@my_index.doc_type
class PageSearchIndex(Document):
    class Django:
        model = Page

    text = fields.TextField()
    def prepare_text(self, object):
        return 'whatever, extracted from page'

@my_index.doc_type
class ProductSearchIndex(Document):
    class Django:
        model = Product

    text = fields.TextField()
    def prepare_text(self, object):
        return 'whatever, extracted from product'

This works totally fine, with the one caveat that when a product and a page have the same pk, one of them will not appear in the index.

The reason is that _prepare_action() generates the field _id as object_instance.pk. This will generate duplicate _id for different models that by chance have the same pk.

A quick fix is to add

    def _prepare_action(self, object_instance, action):
        u = super()._prepare_action(object_instance, action)
        u['_id'] = '%s:%s' % (object_instance._meta.app_label, object_instance.pk)
        return u

to the classes above, so the _id is unique by model and object pk.

I think that should be the default, what do you think? Will that cause trouble somewhere else?

mjl avatar Nov 30 '20 14:11 mjl

ElasticSearch recommends you to have one object type per index. So you can tweak the way you've done, but this shall not be the default. The default is to have different index for each model.

alexgarel avatar Nov 30 '20 15:11 alexgarel

Fair enough. Thanks!

mjl avatar Nov 30 '20 15:11 mjl