django-haystack icon indicating copy to clipboard operation
django-haystack copied to clipboard

Add Support for "track_total_hits" to Accurately Track Total Hits Above 10K

Open nikolaysm opened this issue 1 year ago • 2 comments

Issue: Django Haystack currently does not support the track_total_hits parameter, leading to inaccurate hit counts when the total exceeds 10,000 (due to Elasticsearch's default limit).

Suggestion: We can add an option in the Elasticsearch backend to enable track_total_hits, allowing users to get accurate total hit counts above 10,000.

if settings.TRACK_TOTAL_HITS: 
    search_kwargs["track_total_hits"] = "true"

This could be added within the search method of elasticsearch_backend.py, allowing users to enable accurate hit tracking.

Reference: Elasticsearch: Track Total Hits

nikolaysm avatar Aug 08 '24 12:08 nikolaysm

That would be a good first pull-request. I might also want to think about how it could be made optional for a query as track_total_hits is not enabled by default to improve performance and some users might want to be selective about when they use it.

acdha avatar Aug 08 '24 13:08 acdha

That would be a good first pull-request. I might also want to think about how it could be made optional for a query as track_total_hits is not enabled by default to improve performance and some users might want to be selective about when they use it.

If I were to add this feature, I would include a setting in settings.py:

HAYSTACK_TRACK_TOTAL_HITS = False

This would default to False for performance reasons. If set to True, track_total_hits would be enabled in the search method. I'll draft the pull request with these changes and update the documentation accordingly as soon as possible.

Thanks again for the suggestion!

nikolaysm avatar Aug 08 '24 13:08 nikolaysm