spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Pipeline for opentapioca not working

Open piaschwarz opened this issue 1 year ago • 2 comments

Discussed in https://github.com/explosion/spaCy/discussions/13678

Originally posted by piaschwarz October 25, 2024 I am using opentapioca for entity linking. The code worked before but now calling the opentapioca endpoint throws an error (happens with and without the endpoint url in the config variable).

Is it a problem on my side or is the endpoint not working?

Versions: Python 3.10.6 spacy 3.3.1 spacy-transformers 1.1.7 spacyopentapioca 0.1.7

My code:

import spacy

nlp_spacy_trf = spacy.load('de_dep_news_trf')
dummy_text = "Christian Drosten arbeitet an der Charité in Berlin."

nlp_spacy_trf.add_pipe('opentapioca', config={"url": "https://opentapioca.wordlift.io/api/annotate?lc=de"})
doc = nlp_spacy_trf(dummy_text)
for span in doc.ents:
     print((span.text, span.kb_id_, span.label_, span._.description, span._.score))

keeps throwing this error:

Traceback (most recent call last):
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='opentapioca.wordlift.io', port=443): Max retries exceeded with url: /api/annotate?lc=de (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/c/Users/XXXXX/PycharmProjects/EntityLinking/experiments.py", line 10, in <module>
    doc = nlp_spacy_trf(dummy_text)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/language.py", line 1025, in __call__
    error_handler(name, proc, [doc], e)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/util.py", line 1630, in raise_error
    raise e
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacy/language.py", line 1020, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacyopentapioca/entity_linker.py", line 101, in __call__
    r = self.make_request(doc)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/spacyopentapioca/entity_linker.py", line 93, in make_request
    return requests.post(url=self.url,
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/xxxxxxxx/.local/lib/python3.10/site-packages/requests/adapters.py", line 563, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='opentapioca.wordlift.io', port=443): Max retries exceeded with url: /api/annotate?lc=de (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:997)')))
```</div>

piaschwarz avatar Oct 28 '24 07:10 piaschwarz

Has anyone found a fix to this issue?

sieu-tran avatar Dec 01 '25 21:12 sieu-tran

I'm interested in this also. The fix is likely running the web app (details here) locally and using the url https://opentapioca.wordlift.io/. However, I've yet to try this and unsure about dependencies.

I asked the developer of opentapioca here about the Internal Server Error but there is no hosted instance (or endpoint) available right now which explains the error stated in this issue. I think the following issue on the main repo is a good discussion on extended use.

P.s. we should probably move any further discussion to the discussion page since this is external to spaCy...

weezymatt avatar Dec 02 '25 20:12 weezymatt