capstone IndexError in get_previous

IndexError in get_previous_link

Open bensteinberg opened this issue 2 years ago • 2 comments

We have been seeing tracebacks that look like the one below, for /citations/?cursor=eyJwIjogWzAuMCwgIjE5NzQtMDctMTIiLCAzMjg3MDk5XX0%3D&q=15+U.S.C.+%C2%A716. They are mostly on /citations/ but occasionally on the cases API. The error emails do not contain a referer. My guess is that a person is reloading an old tab, or a bot is using a previously stashed URL, and the cursor in question has expired. If that's right, what should the correct behavior be? a 404? a "page has expired" view? A redirect to the base view for the query without any cursor?

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/core/handlers/exception.py" in inner
  47.                 response = get_response(request)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
  181.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/share/capstone-prod/capstone/cite/views.py" in citations
  530.     data = api_request(request, CaseDocumentViewSet, 'list', get_params=params).data

File "/usr/local/share/capstone-prod/capstone/capapi/resources.py" in api_request
  263.     return viewset.as_view({'get': method})(api_request, **url_kwargs)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  54.         return view_func(*args, **kwargs)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/viewsets.py" in view
  125.             return self.dispatch(request, *args, **kwargs)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
  509.             response = self.handle_exception(exc)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in handle_exception
  469.             self.raise_uncaught_exception(exc)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in raise_uncaught_exception
  480.         raise exc

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
  506.             response = handler(request, *args, **kwargs)

File "/usr/local/share/capstone-prod/capstone/capapi/views/api_views.py" in list
  320.         return super(CaseDocumentViewSet, self).list(request, *args, **kwargs)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/mixins.py" in list
  43.             return self.get_paginated_response(serializer.data)

File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/generics.py" in get_paginated_response
  178.         return self.paginator.get_paginated_response(data)

File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_paginated_response
  222.         return Response(OrderedDict(self.get_paginated_response_context(data)))

File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_paginated_response_context
  204.             ('previous', self.get_previous_link()),

File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_previous_link
  317.         return self.encode_cursor({'p': self.resp['hits']['hits'][0]['sort'], 'r':1})

Exception Type: IndexError at /citations/
Exception Value: list index out of range
Request information:
GET:
cursor = 'eyJwIjogWzAuMCwgIjE5NzQtMDctMTIiLCAzMjg3MDk5XX0='
q = '15 U.S.C. §16'

Oct 18 '21 13:10 bensteinberg

My guess is that a person is reloading an old tab, or a bot is using a previously stashed URL, and the cursor in question has expired.

Our cursors don't expire -- they're just instructions for additional filters to apply to an elasticsearch query so it returns the next page of items. For example if a query was sorted by date and then ID, the forward cursor would say "next page should be filtered to date >= x and id >= y" and the reverse cursor would say "previous page should be sorted by -date, -id and filtered to date <= x and id <= y".

(Behind the scenes note: this kind of cursor is useful because b-trees are fast to do "give me all results > x", but slow to do "give me results 10000-10010". We have enough items that limit-offset pagination doesn't work.)

I don't know why we get queries with invalid cursors, but we still get them a few times a day so we should fix the 500 error. Could be some script messing with cursors and creating invalid queries, since it's not valid in this scheme to change the other query parameters and not delete the cursor, and we can't detect that. Or could be some way I haven't found yet that our code can generate invalid cursors.

Either way I think we should poke at this and see if we can see a way to make the cursors more robust, or if the reason for the error jumps out with fresh eyes. And whether we find an improvement or not, we should add a try-catch for invalid cursors and return a 400 bad cursor response, since people can send invalid cursors under any design. One soft design goal if we change how the cursors work would be not to make them too much longer or uglier.

It might be worth moving all the query parameters into the cursor blob so people aren't tempted to edit them, but not sure if worth it.

May 24 '22 13:05 jcushman

From the traceback itself, I'd guess the problem is somewhere around an expectation in the code that self.resp['hits']['hits'] is always non-zero length, but isn't. I've definitely seen bugs in other search products where a result can be returned from the index itself, but the hit highlighting code has slightly different business rules and so there's no direct match in the returned context and therefore no hits to highlight.

I couldn't reproduce this exact case since I'm not sure what composed the original query. The value of cursor decodes to {"p": [0.0, "1974-07-12", 3287099]}' but I dunno what fields those represent, and a full-text match on '15 U.S.C. §16' returns a zillion results. But if I were to defend against this, I'd probably start with the assumption that self.resp['hits']['hits'] is always non-empty and see why that might not be true.

May 25 '22 13:05 lizadaly

Detect invalid cursor and return that error instead of this error.

Jan 12 '23 20:01 kilbergr

capstone capstone copied to clipboard

IndexError in get_previous_link

capstone
capstone copied to clipboard