django-haystack icon indicating copy to clipboard operation
django-haystack copied to clipboard

Infinite loop/hang when a query returns an index for an object that isn't present

Open techlover10 opened this issue 1 year ago • 2 comments

  • [✓] Tested with the latest Haystack release
  • [✓] Tested with the current Haystack master branch

When a search query returns an index for which there is no associated model, for whatever reason, Haystack can hang indefinitely.

Expected behaviour

The query should gracefully omit the stale index.

Actual behaviour

The result cache invariant gets broken and the query gets stuck in an endless loop.

Steps to reproduce the behaviour

  1. Create some number of objects and index them.
  2. Delete or otherwise hide one of the objects from being returned in search results, without updating the search index.
  3. Perform a search query which would include the index of the hidden or removed object in the search results.

I was able to identify and resolve the issue for our own internal fork, but I wanted to report it nonetheless because it seems to be a very obviously incorrect logical issue in the code. In line 193 of query.py, the "else" case breaks the _result_cache invariant of not having None's in the middle of your results.

            else:
                    # No objects were returned -- possible due to SQS nesting such as
                    # XYZ.objects.filter(id__gt=10) where the amount ignored are
                    # exactly equal to the ITERATOR_LOAD_PER_QUERY
                    del self._result_cache[: len(results)]
                    self._ignored_result_count += len(results)
                    break

What this means is that as you go through pages of results, if you end up with one result that isn't matched back to a model, this else block kicks in and deletes the first len(results) items from self._result_cache, which not only removes valid results but also mutates self._result_cache such that cache_start and cache_end which were defined prior to calling self.post_process_results(results) are no longer valid. I did not submit a pull request as it's unclear to me what the purpose of this block even is, so I don't really know how to fix it (the git blame appears to cite a broken test case as the reason this code was introduced many years ago). Our internal solution was just to replace this entire block to just increment self.ignored_result_count (there's no reason why it should be incrementing by len(results) in an iterator over results as far as I can tell) and then continue instead of breaking.

As it stands, the result of this mutation is that you can end up in a situation where self._result_cache can look something like this: [<model>, <model>, ..., <model>, None, None, None, None, None, None, None, None, None, None, <model>, <model>, ..., <model>, None, None, ..., None]

This results in an infinite loop while filling the cache as setting the current_cache_max to self._result_cache.index(None) will result in an index that is not the end of the cached results, and therefore an index for which the queries to the search backend will return a non-empty page of results. Haystack will then continue indefinitely trying to cache "more" results but will never complete this task as it is now stuck in a state where the "next page" of results is never empty.

Configuration

  • Operating system version: AlmaLinux 8.6
  • Search engine version: Solr 6.6.6
  • Python version: 3.9.7
  • Django version: 3.2.7
  • Haystack version: 3.3.0

techlover10 avatar Aug 30 '24 19:08 techlover10

Does this also happen on a supported version of Django (v4.2, v5.0, v5.1?)

cclauss avatar Aug 30 '24 20:08 cclauss

I don't have an easy way to test it as internally our project relies on some deprecated functions which were fully removed in Django 4.x, so upgrading to test is not a straightforward and quick endeavor. That said, this bug itself is not new and is something I have seen previously back when Django 3.2 was still officially supported (I should note that I've also seen it with older versions of Haystack, and the versions we have in use are within the compatibility bounds listed on the main Haystack README). The only reason I didn't previously report it was because the last time I encountered it I found a different workaround that addressed an internal bug in our stack which was creating these stale indices, so I hadn't managed to trace down the actual source of the hang yet.

techlover10 avatar Aug 30 '24 20:08 techlover10