mordecai3
mordecai3 copied to clipboard
Give informative errors if geoparser resources (spacy model, ES + data) are not available
I setup a new venv and tried to install mordecai3 from the spacy-3-8 branch. That bit works.
However, the geoparser instantiation fails with an uncaught error if the spacy model has not been downloaded, and it doesn't fail if ES is not running. In either situation it cannot actually be used.
import mordecai3
geo = mordecai3.Geoparser()
No spacy
Fail with uncaught error.
Steps:
- Install mordecai3 from git, spacy-3-8 branch.
- Start ES service
- Try to import and instantiate the geoparser.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/mordecai3/geoparse.py", line 184, in __init__
self.nlp = load_nlp()
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/mordecai3/geoparse.py", line 40, in load_nlp
nlp = spacy.load("en_core_web_trf")
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/spacy/__init__.py", line 52, in load
return util.load_model(
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/spacy/util.py", line 484, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_trf'. It doesn't seem to be a Python package or a valid path to a data directory.
No ES
Prints a bunch of log messages that indicate ES is not available, but the highest level is WARNING. ES does include tracebacks and caught Python errors as part of its log messages. But there is no Python-side error, and the geo object is created. Obviously can't be used for geocoding.
- Install mordecai3 from git, spacy-3-8 branch.
- Download the spacy model
- Try to import and instantiate the geoparser.
2025-09-11 10:07:00,546 root INFO Checking Elasticsearch connection...
2025-09-11 10:07:00,547 urllib3.connectionpool DEBUG Starting new HTTP connection (1): localhost:9200
2025-09-11 10:07:00,548 elasticsearch WARNING GET http://localhost:9200/ [status:N/A request:0.001s]
Traceback (most recent call last):
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/elasticsearch/connection/http_urllib3.py", line 255, in perform_request
response = self.pool.urlopen(
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 527, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 1038, in _send_output
self.send(msg)
File "/Users/andy/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/http/client.py", line 976, in send
self.connect()
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/Users/andy/Downloads/test/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1672dbfa0>: Failed to establish a new connection: [Errno 61] Connection refused
2025-09-11 10:07:00,550 elasticsearch DEBUG > None
2025-09-11 10:07:00,550 root WARNING Could not connect to Elasticsearch, but the logic of this code path may be wrong...
Instantiation works, we get the object in Python:
>>> geo
<mordecai3.geoparse.Geoparser object at 0x104b83550>
Next steps
- Should it even be possible to create a
Geoparser()instance if the required additional resources are not available? I lean towards "no" because what would the instance be used for?- If "yes" -> catch errors, issue log messages, but instance can be created
- If "no" -> check for resource presence, issue corresponding errors with fix instruction
- Check for spacy model presence and if not issue a corresponding Python error or ERROR log message with fix instruction.
- Adjust the log message in Geoparser so that it's at the ERROR level and more clearly describes what the issue and implication is, or convert it into an Python-side error.
- Talking about
2025-09-11 10:07:00,550 root WARNING Could not connect to Elasticsearch, but the logic of this code path may be wrong... - Maybe also add an option to Geoparser that by default adjusts the urllib3 and ES log levels so we don't get the wall of log messages from those and the Geoparser WARNING or ERROR message is easier to identify?
- See also for general logging clean up #35
- Talking about
@ahalterman what's your opinion on how to handle this?