pysolr
pysolr copied to clipboard
Update requests are not always sent to leader in solr cloud
The _update method in class SolrCloud invokes Solr._update which in turn invokes SolrCloud._send_request by line 564 or 568:
https://github.com/django-haystack/pysolr/blob/da1522f5bcbfd209ea0b13ae355389cd327258bb/pysolr.py#L564
https://github.com/django-haystack/pysolr/blob/da1522f5bcbfd209ea0b13ae355389cd327258bb/pysolr.py#L568
This invocation would generate a randomized request to one of the nodes in solr cluster, which is likely to bypass the leader node. It takes some patch to class SolrCloud like this to work around this problem:
def _send_request(self, method, path='', body=None, headers=None, files=None): # FIXME: this needs to have a maximum retry counter rather than waiting endlessly try: - return self._randomized_request(method, path, body, headers, files) + if 'update/' in urlparse(path).path: + return Solr._send_request(self, method, path, body, headers, files) + else: + return self._randomized_request(method, path, body, headers, files)
If you want to send a pull request, that seems reasonable
I mean it's not so reasonable for update requests. I'm running a solr cloud version 6.6.3, and I found in this environment when update data are sent to a replica node, this node would forward it to leader node for indexing.So it clearly results in a waste of networking resource if the client fails to find the correct leader node to handle the update request.