Pipeline for opentapioca not working
Discussed in https://github.com/explosion/spaCy/discussions/13678
Originally posted by piaschwarz October 25, 2024 I am using opentapioca for entity linking. The code worked before but now calling the opentapioca endpoint throws an error (happens with and without the endpoint url in the config variable).
Is it a problem on my side or is the endpoint not working?
Versions: Python 3.10.6 spacy 3.3.1 spacy-transformers 1.1.7 spacyopentapioca 0.1.7
My code:
import spacy
nlp_spacy_trf = spacy.load('de_dep_news_trf')
dummy_text = "Christian Drosten arbeitet an der Charité in Berlin."
nlp_spacy_trf.add_pipe('opentapioca', config={"url": "https://opentapioca.wordlift.io/api/annotate?lc=de"})
doc = nlp_spacy_trf(dummy_text)
for span in doc.ents:
print((span.text, span.kb_id_, span.label_, span._.description, span._.score))
keeps throwing this error:
Traceback (most recent call last):
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.10/ssl.py", line 1071, in _create
self.do_handshake()
File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='opentapioca.wordlift.io', port=443): Max retries exceeded with url: /api/annotate?lc=de (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/c/Users/XXXXX/PycharmProjects/EntityLinking/experiments.py", line 10, in <module>
doc = nlp_spacy_trf(dummy_text)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/language.py", line 1025, in __call__
error_handler(name, proc, [doc], e)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/util.py", line 1630, in raise_error
raise e
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/language.py", line 1020, in __call__
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacyopentapioca/entity_linker.py", line 101, in __call__
r = self.make_request(doc)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacyopentapioca/entity_linker.py", line 93, in make_request
return requests.post(url=self.url,
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/adapters.py", line 563, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='opentapioca.wordlift.io', port=443): Max retries exceeded with url: /api/annotate?lc=de (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)')))
```</div>
Has anyone found a fix to this issue?
I'm interested in this also. The fix is likely running the web app (details here) locally and using the url https://opentapioca.wordlift.io/. However, I've yet to try this and unsure about dependencies.
I asked the developer of opentapioca here about the Internal Server Error but there is no hosted instance (or endpoint) available right now which explains the error stated in this issue. I think the following issue on the main repo is a good discussion on extended use.
P.s. we should probably move any further discussion to the discussion page since this is external to spaCy...