ontobio icon indicating copy to clipboard operation
ontobio copied to clipboard

ontobio is not HTTPS-safe

Open kltm opened this issue 1 year ago • 1 comments

We recently discovered, while working on Cloudflare for GO public access points (https://github.com/geneontology/operations/issues/70) that we got errors in ontobio.

[2024-08-20T06:50:13.986Z] Traceback (most recent call last):
[2024-08-20T06:50:13.986Z]   File "/usr/local/bin/validate.py", line 999, in <module>
[2024-08-20T06:50:13.986Z]     cli(obj={})
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
[2024-08-20T06:50:13.986Z]     return self.main(*args, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
[2024-08-20T06:50:13.986Z]     rv = self.invoke(ctx)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
[2024-08-20T06:50:13.986Z]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
[2024-08-20T06:50:13.986Z]     return ctx.invoke(self.callback, **ctx.params)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
[2024-08-20T06:50:13.986Z]     return __callback(*args, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
[2024-08-20T06:50:13.986Z]     return f(get_current_context(), *args, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/bin/validate.py", line 722, in produce
[2024-08-20T06:50:13.986Z]     matching_gpi_path = download_a_dataset_source(group, ds, absolute_target, ds["source"],
[2024-08-20T06:50:13.986Z]   File "/usr/local/bin/validate.py", line 106, in download_a_dataset_source
[2024-08-20T06:50:13.986Z]     response = requests.get(reconstructed_url, stream=True)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/requests/api.py", line 73, in get
[2024-08-20T06:50:13.986Z]     return request("get", url, params=params, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/requests/api.py", line 59, in request
[2024-08-20T06:50:13.986Z]     return session.request(method=method, url=url, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request
[2024-08-20T06:50:13.986Z]     resp = self.send(prep, **send_kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send
[2024-08-20T06:50:13.986Z]     r = adapter.send(request, **kwargs)
[2024-08-20T06:50:13.986Z]   File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 501, in send
[2024-08-20T06:50:13.986Z]     raise ConnectionError(err, request=request)
[2024-08-20T06:50:13.986Z] requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
[2024-08-20T06:50:15.848Z] make: *** [Makefile:86: target/groups/dictybase/dictybase.group] Error 1

Poking around it may be from some hard-coded section like

    def create_from_remote_file(self, group, snapshot=True, **args):
        """
        Creates from remote GAF
        """
        import requests
        url = "http://snapshot.geneontology.org/annotations/{}.gaf.gz".format(group)
        r = requests.get(url, stream=True, headers={'User-Agent': get_user_agent(modules=[requests], caller_name=__name__)})
        p = GafParser()
        results = p.skim(r.raw)
        return self.create_from_tuples(results, **args)

in ./ontobio/assoc_factory.py (or somewhere else), called from validate.py.

We can mitigate this by allowing both HTTP and HTTPS connections to snapshot.geneontology.org, as we were last week.

A fix would be to check all of the requests lib usage and make sure that our external calls are okay to be 301 upgraded from HTTP to HTTPS.

kltm avatar Aug 20 '24 23:08 kltm

This does not currently have a priority.

kltm avatar Aug 20 '24 23:08 kltm