pysolr icon indicating copy to clipboard operation
pysolr copied to clipboard

Update requests are not always sent to leader in solr cloud

Open anguslee opened this issue 5 years ago • 2 comments

The _update method in class SolrCloud invokes Solr._update which in turn invokes SolrCloud._send_request by line 564 or 568:

https://github.com/django-haystack/pysolr/blob/da1522f5bcbfd209ea0b13ae355389cd327258bb/pysolr.py#L564

https://github.com/django-haystack/pysolr/blob/da1522f5bcbfd209ea0b13ae355389cd327258bb/pysolr.py#L568

This invocation would generate a randomized request to one of the nodes in solr cluster, which is likely to bypass the leader node. It takes some patch to class SolrCloud like this to work around this problem:

     def _send_request(self, method, path='', body=None, headers=None, files=None):
         # FIXME: this needs to have a maximum retry counter rather than waiting endlessly
         try:
-            return self._randomized_request(method, path, body, headers, files)
+            if 'update/' in urlparse(path).path:
+                return Solr._send_request(self, method, path, body, headers, files)
+            else:
+                return self._randomized_request(method, path, body, headers, files)

anguslee avatar Oct 30 '19 08:10 anguslee

If you want to send a pull request, that seems reasonable

acdha avatar Oct 30 '19 17:10 acdha

I mean it's not so reasonable for update requests. I'm running a solr cloud version 6.6.3, and I found in this environment when update data are sent to a replica node, this node would forward it to leader node for indexing.So it clearly results in a waste of networking resource if the client fails to find the correct leader node to handle the update request.

anguslee avatar Oct 31 '19 04:10 anguslee