ontology-development-kit icon indicating copy to clipboard operation
ontology-development-kit copied to clipboard

Try downloading compressed versions of remote ontologies

Open gouttegd opened this issue 1 month ago • 1 comments

Whenever the user declares they want to import a foreign ontology (say, ro):

import_group:
  products:
    - id: ro

the generated mirroring code should first try to download http://purl.obolibrary.org/obo/ro.owl.gz, and then fallback to http://purl.obolibrary.org/obo/ro.owl if we get a 404.

This would in effect make the use_gzipped option redundant (it would be as if this option was always on by default – except for the fact that currently this option does not include the fallback behaviour).

Open to discussion about what we should do when the user explicitly provides a mirror_from URL:

import_group:
  products:
    - id: ro
      mirror_from: https://example.org/my/custom/mirroring/site/ro.owl

For now I am inclined to say that we should not try to mess with any explicitly provided URL (so, we do not try to append .gz)

gouttegd avatar Dec 14 '25 21:12 gouttegd

We should check with @jamesaoverton but I am not entirely sure if http://purl.obolibrary.org/obo/ro.owl.gz is allowed by the OBO purl system or if it has to be http://purl.obolibrary.org/obo/ro/ro.owl.gz. Would be good if appending .gz was the only think that is needed.

I like what you are proposing!

matentzn avatar Dec 15 '25 21:12 matentzn

RO (and any other project) would have to opt-in by adding an ro.owl.gz entry to the products list: https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/ro.yml#L8

The PURL code might warn about not liking the "owl.gz" extension, but we can tweak that.

And we can have a wider discussion about policy for providing compressed artifacts. A decade ago we figured that clients and servers would transparently compress data during transfer, and we weren't worried about storage, but our ontology files continue to get larger.

jamesaoverton avatar Dec 16 '25 14:12 jamesaoverton