linkml-runtime icon indicating copy to clipboard operation
linkml-runtime copied to clipboard

Proposal for CurieNamespace with embedded catalog.

Open hsolbrig opened this issue 1 year ago • 10 comments

It is my understanding that we are supposed to use the curies package, but it isn't clear how one would add maps incrementally (see: tests/test_utils/test_curienamespace.py#113 as an example). Any solution that passes test_curienamespace.py should do.

hsolbrig avatar Feb 02 '23 20:02 hsolbrig

i guess the test fails because the curienamespace catalog was already populated from test_enum which ran before this test...

kervel avatar Feb 04 '23 09:02 kervel

Wouild it be an option to make the catalog an instance variable instead of a class variable ? if we want to scope namespaces somehow that will be easier ? And it doesn't require passing a curieNamespaces object to each and every function call because we can still have a module-level defined singleton in generated code ?

Other way to fix it is some pre-test code that cleans up the catalog

kervel avatar Feb 08 '23 17:02 kervel

comments from @kervel on slack:

fails tests because of global prefix catalog in the curienamespaces class. If we would change the order of the tests it would probably pass, but i think moving the catalog out of a class variable so that it becomes scoped sounds like a more robust solution. I cannot estimate the impact this would have on other uses of the prefix catalog

I think I agree but I also think I am lacking a bit of context here. It would be helpful to state here or in the initial comment what the motivation for this PR is, and a high level description of what it seeks to achieve.

cmungall avatar Feb 23 '23 01:02 cmungall

i'm trying to state the motivation here:

the idea is that the python data classes, both when used to generate json and to parse json, should be able to interpret the type designator URI/CURIES in the most broad/flexible way possible. This is especially true if the range of a type designator field is "uriorcurie".

There are several uri's to designate a type: one has both the native and the assigned (via class_uri uri's for a class), which should yield the same behaviour when used as a type designator. And then both of them can be shortened to curies using any of the prefixes in the prefix map. If a user defines multiple prefixes to the same uri namespaces (for instance because he includes yaml written by someboyd else), he would expect all known prefixes to just work.

so

The idea is for the python data classes to use the metadata already available to do this translation, rather than (like we do in pydantic) just having an exhaustive list of accepted uri's / curies as string values (which makes sense for pydantic because there we don't even depend on linkml-runtime)

so this PR adds functions that translate uries to curies and back using metadata that is in linkml runtime curie namespaces, an object which is already available in the python dataclasses right now.

kervel avatar Feb 23 '23 09:02 kervel

hello, the reason this PR doesn't use the urieconverter from https://github.com/cthoyt/curies is laid out in the first comment here. So i created an issue there https://github.com/cthoyt/curies/issues/33 to see if that can be fixed (author replied in the mean time)

@hsolbrig https://github.com/cthoyt/curies/pull/34 . is the following okay ?

  • factor out a separate class CurieNamespaceCatalog that manages the catalog as an instance variable ?
  • make this class a thin wrapper around the curies package ? Do we even need a wrapper ?

then we could do

catalog = CurieNamespaceCatalog()
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/', catalog=catalog)
SHEX = CurieNamespace('shex', 'http://www.w3.org/ns/shex#', catalog=catalog)
XSD = CurieNamespace('xsd', 'http://www.w3.org/2001/XMLSchema#', catalog=catalog)
DEFAULT_ = CurieNamespace('', 'http://example.org/', catalog=catalog)

or alternatively

catalog = CurieNamespaceCatalog()
LINKML = catalog.namespace('linkml', 'https://w3id.org/linkml/')
SHEX = catalog.namespace('shex', 'http://www.w3.org/ns/shex#')
XSD = catalog.namespace('xsd', 'http://www.w3.org/2001/XMLSchema#')
DEFAULT_ = catalog.namespace('', 'http://example.org/')

kervel avatar Feb 27 '23 11:02 kervel

@kervel @hsolbrig the new curies v0.4.3 with incremental construction has been released to PyPI. You can upgrade and use something like the following:

import curies

converter = curies.Converter(records=[])
converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")

cthoyt avatar Feb 27 '23 13:02 cthoyt

hello, the test done by @hsolbrig still fails when trying to convert to the new curies v0.4.3:

        # Test incremental add
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)
        self.assertEqual('tester:hip_boots', namespaceCatalog.to_curie('http://fester.bester/tester#hip_boots'))

        # Test multiple prefixes for same suffix
        CurieNamespace('ns17', URIRef('http://fester.bester/tester#')).addTo(converter)

here, ns17 is actually a synonym. so curies refuses to take it as a new prefix. off course this could be fixed so that curies automatically converts this to a synonym for an existing Record, but that would complicate things.

this would also fail (a bit further down the same test)

        # Test multiple uris for same prefix
        # The following should be benign
        CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)

        # Issue warnings for now on this
        # TODO: test that we log the following
        #       'Prefix: tester already references http://fester.bester/tester# -
        #       not updated to http://fester.notsogood/tester#'
        CurieNamespace('tester', URIRef('http://fester.notsogood/tester#')).addTo(converter)

kervel avatar Mar 01 '23 15:03 kervel

It's possible I could implement something like add_prefix_synonym and add_uri_prefix_synonym to curies.Converter if that would be a good stop-gap before making some more fundamental improvements to the way you're doing bookkeeping

cthoyt avatar Mar 01 '23 15:03 cthoyt

@cthoyt i'd rather first get some feedback from the people that know this code base better than i do. the originally proposed api in linkml-runtime doesn't even make a distinction between a synonym and a new prefix (as can be seen from the tests).

kervel avatar Mar 01 '23 15:03 kervel

i created a working proposal here https://github.com/linkml/linkml-runtime/pull/251 ... i (for now) now bypass the incremental building. code is way too complicated: i need to detect synonyms and i can have transitive equality if you both have prefix synonyms and uri synonyms

not feeling very confident on this

kervel avatar Mar 01 '23 16:03 kervel