richie icon indicating copy to clipboard operation
richie copied to clipboard

Timeout during index creation

Open igobranco opened this issue 3 years ago • 3 comments
trafficstars

Bug Report

Sometimes when we reindex the elastic search on our Richie installation it timeouts.

Logs

Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the MenusConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msearch.SearchAccess: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the SearchConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msocial_django.Association: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the PythonSocialAuthConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msocial_django.Code: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the PythonSocialAuthConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msocial_django.Nonce: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the PythonSocialAuthConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msocial_django.Partial: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the PythonSocialAuthConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "\u001b[33;1msocial_django.UserSocialAuth: (models.W042) Auto-created primary key used when not defining a primary key type, by default 'django.db.models.AutoField'.", 
HINT: Configure the DEFAULT_AUTO_FIELD setting or the PythonSocialAuthConfig.default_auto_field attribute to point to a subclass of AutoField, e.g. 'django.db.models.BigAutoField'.\u001b[0m", "Traceback (most recent call last):", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 449, in _make_request", "    six.raise_from(e, None)", "  File \"<string>\", line 3, in raise_from", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 444, in _make_request", "    httplib_response = conn.getresponse()", "  File \"/usr/local/lib/python3.10/http/client.py\", line 1374, in getresponse", "    response.begin()", "  File \"/usr/local/lib/python3.10/http/client.py\", line 318, in begin", "    version, status, reason = self._read_status()", "  File \"/usr/local/lib/python3.10/http/client.py\", line 279, in _read_status", "    line = str(self.fp.readline(_MAXLINE + 1), \"iso-8859-1\")", "  File \"/usr/local/lib/python3.10/socket.py\", line 705, in readinto", "    return self._sock.recv_into(b)", "TimeoutError: timed out", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", "  File \"/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_urllib3.py\", line 232, in perform_request", "    response = self.pool.urlopen(", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 785, in urlopen", "    retries = retries.increment(", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py\", line 525, in increment", "    raise six.reraise(type(error), error, _stacktrace)", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py\", line 770, in reraise", "    raise value", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 703, in urlopen", "    httplib_response = self._make_request(", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 451, in _make_request", "    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)", "  File \"/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py\", line 340, in _raise_timeout", "    raise ReadTimeoutError(", "urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='elasticsearch', port=9200): Read timed out. (read timeout=10)", "", "During handling of the above exception, another exception occurred:", "", "Traceback (most recent call last):", "  File \"/app/manage.py\", line 14, in <module>", "    execute_from_command_line(sys.argv)", "  File \"/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py\", line 419, in execute_from_command_line", "    utility.execute()", "  File \"/usr/local/lib/python3.10/site-packages/django/core/management/__init__.py\", line 413, in execute", "    self.fetch_command(subcommand).run_from_argv(self.argv)", "  File \"/usr/local/lib/python3.10/site-packages/django/core/management/base.py\", line 354, in run_from_argv", "    self.execute(*args, **cmd_options)", "  File \"/usr/local/lib/python3.10/site-packages/django/core/management/base.py\", line 398, in execute", "    output = self.handle(*args, **options)", "  File \"/usr/local/lib/python3.10/site-packages/richie/apps/search/management/commands/bootstrap_elasticsearch.py\", line 29, in handle", "    regenerate_indices(logger)", "  File \"/usr/local/lib/python3.10/site-packages/richie/apps/search/index_manager.py\", line 80, in regenerate_indices", "    list(map(lambda ix: perform_create_index(ix, logger), ES_INDICES)), ES_INDICES", "  File \"/usr/local/lib/python3.10/site-packages/richie/apps/search/index_manager.py\", line 80, in <lambda>", "    list(map(lambda ix: perform_create_index(ix, logger), ES_INDICES)), ES_INDICES", "  File \"/usr/local/lib/python3.10/site-packages/richie/apps/search/index_manager.py\", line 49, in perform_create_index", "    ES_INDICES_CLIENT.create(index=new_index)", "  File \"/usr/local/lib/python3.10/site-packages/elasticsearch/client/utils.py\", line 101, in _wrapped", "    return func(*args, params=params, **kwargs)", "  File \"/usr/local/lib/python3.10/site-packages/elasticsearch/client/indices.py\", line 123, in create", "    return self.transport.perform_request(", "  File \"/usr/local/lib/python3.10/site-packages/elasticsearch/transport.py\", line 402, in perform_request", "    status, headers_response, data = connection.perform_request(", "  File \"/usr/local/lib/python3.10/site-packages/elasticsearch/connection/http_urllib3.py\", line 244, in perform_request", "    raise ConnectionTimeout(\"TIMEOUT\", str(e), e)", "elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch', port=9200): Read timed out. (read timeout=10))", "Makefile:98: recipe for target 'search-index' failed", "make: Leaving directory '/nau/ops/richie'"]}

I think changing the following code on the file src/richie/apps/search/index_manager.py From:

ES_INDICES_CLIENT.create(index=new_index)

To:

ES_INDICES_CLIENT.create(
      index=new_index, timeout=getattr(settings, "RICHIE_ES_CREATE_INDEX_TIMEOUT", 10)
)

And on our Richie project nau-richie-site-factory explicitly increase its value to 30 seconds, would fix the problem.

This optional timeout change would only change the index creation and not the normal queries.

References:

  • https://elasticsearch-py.readthedocs.io/en/v8.0.0/api.html#module-elasticsearch
  • https://stackoverflow.com/questions/28287261/connection-timeout-with-elasticsearch

@sampaccoud I could send a PR with this change if you are ok with this change.

igobranco avatar Feb 24 '22 13:02 igobranco

I'm ok with this but it never happened to us so I'm thinking it may be a symptom that your Elasticsearch is not well dimensioned? Would it make sense that this setting is used for all ES queries?

sampaccoud avatar Feb 24 '22 13:02 sampaccoud

I'm ok with this but it never happened to us so I'm thinking it may be a symptom that your Elasticsearch is not well dimensioned?

We have and active-passive installation. Where each node has everything, nginx, app, elasticsearch and db. So every day we synchronize the servers, we backup from primary node and restore the data and the index on replica node. During the restore we run search-index that reindex the elastic search. If we stop every container, then it run fine, but if I don't stop the app and nginx containers, then it timeout's. Probably, this occurs because the server is busy doing other things, so it takes longer time to reindex and it probably creates the timeout.

Would it make sense that this setting is used for all ES queries?

I think we shouldn't increase timeout on every interaction with elasticsearch. The normal search query should need to be rapid. If elasticsearch takes too long to respond, we could saturate all the gunicorn workers.

igobranco avatar Feb 24 '22 14:02 igobranco

@sampaccoud nevertheless I could include also a different setting that allows to change the overall timeout. Probably on the file src/richie/apps/search/__init__.py line 12.

igobranco avatar Feb 24 '22 15:02 igobranco