mordecai3 icon indicating copy to clipboard operation
mordecai3 copied to clipboard

Give informative errors if geoparser resources (spacy model, ES + data) are not available

Open andybega opened this issue 7 months ago • 0 comments

I setup a new venv and tried to install mordecai3 from the spacy-3-8 branch. That bit works.

However, the geoparser instantiation fails with an uncaught error if the spacy model has not been downloaded, and it doesn't fail if ES is not running. In either situation it cannot actually be used.

import mordecai3
geo = mordecai3.Geoparser()

No spacy

Fail with uncaught error.

Steps:

  1. Install mordecai3 from git, spacy-3-8 branch.
  2. Start ES service
  3. Try to import and instantiate the geoparser.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/mordecai3/geoparse.py", line 184, in __init__
    self.nlp = load_nlp()
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/mordecai3/geoparse.py", line 40, in load_nlp
    nlp = spacy.load("en_core_web_trf")
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/spacy/__init__.py", line 52, in load
    return util.load_model(
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/spacy/util.py", line 484, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory.

No ES

Prints a bunch of log messages that indicate ES is not available, but the highest level is WARNING. ES does include tracebacks and caught Python errors as part of its log messages. But there is no Python-side error, and the geo object is created. Obviously can't be used for geocoding.

  1. Install mordecai3 from git, spacy-3-8 branch.
  2. Download the spacy model
  3. Try to import and instantiate the geoparser.
2025-09-11 10:07:00,546 root         INFO     Checking Elasticsearch connection...
2025-09-11 10:07:00,547 urllib3.connectionpool DEBUG    Starting new HTTP connection (1): localhost:9200
2025-09-11 10:07:00,548 elasticsearch WARNING  GET http://localhost:9200/ [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/elasticsearch/connection/http_urllib3.py", line 255, in perform_request
    response = self.pool.urlopen(
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 802, in urlopen
    retries = retries.increment(
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 527, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1672dbfa0>: Failed to establish a new connection: [Errno 61] Connection refused
2025-09-11 10:07:00,550 elasticsearch DEBUG    > None
2025-09-11 10:07:00,550 root         WARNING  Could not connect to Elasticsearch, but the logic of this code path may be wrong...

Instantiation works, we get the object in Python:

>>> geo
<mordecai3.geoparse.Geoparser object at 0x104b83550>

Next steps

  1. Should it even be possible to create a Geoparser() instance if the required additional resources are not available? I lean towards "no" because what would the instance be used for?
    • If "yes" -> catch errors, issue log messages, but instance can be created
    • If "no" -> check for resource presence, issue corresponding errors with fix instruction
  2. Check for spacy model presence and if not issue a corresponding Python error or ERROR log message with fix instruction.
  3. Adjust the log message in Geoparser so that it's at the ERROR level and more clearly describes what the issue and implication is, or convert it into an Python-side error.
    • Talking about 2025-09-11 10:07:00,550 root WARNING Could not connect to Elasticsearch, but the logic of this code path may be wrong...
    • Maybe also add an option to Geoparser that by default adjusts the urllib3 and ES log levels so we don't get the wall of log messages from those and the Geoparser WARNING or ERROR message is easier to identify?
    • See also for general logging clean up #35

@ahalterman what's your opinion on how to handle this?

andybega avatar Sep 11 '25 07:09 andybega