capstone
capstone copied to clipboard
IndexError in get_previous_link
We have been seeing tracebacks that look like the one below, for /citations/?cursor=eyJwIjogWzAuMCwgIjE5NzQtMDctMTIiLCAzMjg3MDk5XX0%3D&q=15+U.S.C.+%C2%A716
. They are mostly on /citations/
but occasionally on the cases API. The error emails do not contain a referer. My guess is that a person is reloading an old tab, or a bot is using a previously stashed URL, and the cursor in question has expired. If that's right, what should the correct behavior be? a 404? a "page has expired" view? A redirect to the base view for the query without any cursor?
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/core/handlers/exception.py" in inner
47. response = get_response(request)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/core/handlers/base.py" in _get_response
181. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/share/capstone-prod/capstone/cite/views.py" in citations
530. data = api_request(request, CaseDocumentViewSet, 'list', get_params=params).data
File "/usr/local/share/capstone-prod/capstone/capapi/resources.py" in api_request
263. return viewset.as_view({'get': method})(api_request, **url_kwargs)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
54. return view_func(*args, **kwargs)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/viewsets.py" in view
125. return self.dispatch(request, *args, **kwargs)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
509. response = self.handle_exception(exc)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in handle_exception
469. self.raise_uncaught_exception(exc)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in raise_uncaught_exception
480. raise exc
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/views.py" in dispatch
506. response = handler(request, *args, **kwargs)
File "/usr/local/share/capstone-prod/capstone/capapi/views/api_views.py" in list
320. return super(CaseDocumentViewSet, self).list(request, *args, **kwargs)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/mixins.py" in list
43. return self.get_paginated_response(serializer.data)
File "/home/capstone/capstone-prod/lib/python3.7/site-packages/rest_framework/generics.py" in get_paginated_response
178. return self.paginator.get_paginated_response(data)
File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_paginated_response
222. return Response(OrderedDict(self.get_paginated_response_context(data)))
File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_paginated_response_context
204. ('previous', self.get_previous_link()),
File "/usr/local/share/capstone-prod/capstone/capapi/pagination.py" in get_previous_link
317. return self.encode_cursor({'p': self.resp['hits']['hits'][0]['sort'], 'r':1})
Exception Type: IndexError at /citations/
Exception Value: list index out of range
Request information:
GET:
cursor = 'eyJwIjogWzAuMCwgIjE5NzQtMDctMTIiLCAzMjg3MDk5XX0='
q = '15 U.S.C. §16'
My guess is that a person is reloading an old tab, or a bot is using a previously stashed URL, and the cursor in question has expired.
Our cursors don't expire -- they're just instructions for additional filters to apply to an elasticsearch query so it returns the next page of items. For example if a query was sorted by date and then ID, the forward cursor would say "next page should be filtered to date >= x and id >= y" and the reverse cursor would say "previous page should be sorted by -date, -id and filtered to date <= x and id <= y".
(Behind the scenes note: this kind of cursor is useful because b-trees are fast to do "give me all results > x", but slow to do "give me results 10000-10010". We have enough items that limit-offset pagination doesn't work.)
I don't know why we get queries with invalid cursors, but we still get them a few times a day so we should fix the 500 error. Could be some script messing with cursors and creating invalid queries, since it's not valid in this scheme to change the other query parameters and not delete the cursor, and we can't detect that. Or could be some way I haven't found yet that our code can generate invalid cursors.
Either way I think we should poke at this and see if we can see a way to make the cursors more robust, or if the reason for the error jumps out with fresh eyes. And whether we find an improvement or not, we should add a try-catch for invalid cursors and return a 400 bad cursor response, since people can send invalid cursors under any design. One soft design goal if we change how the cursors work would be not to make them too much longer or uglier.
It might be worth moving all the query parameters into the cursor blob so people aren't tempted to edit them, but not sure if worth it.
From the traceback itself, I'd guess the problem is somewhere around an expectation in the code that self.resp['hits']['hits']
is always non-zero length, but isn't. I've definitely seen bugs in other search products where a result can be returned from the index itself, but the hit highlighting code has slightly different business rules and so there's no direct match in the returned context and therefore no hits to highlight.
I couldn't reproduce this exact case since I'm not sure what composed the original query. The value of cursor
decodes to {"p": [0.0, "1974-07-12", 3287099]}'
but I dunno what fields those represent, and a full-text match on '15 U.S.C. §16' returns a zillion results. But if I were to defend against this, I'd probably start with the assumption that self.resp['hits']['hits']
is always non-empty and see why that might not be true.
Detect invalid cursor and return that error instead of this error.