linkml-runtime
linkml-runtime copied to clipboard
Proposal for CurieNamespace with embedded catalog.
It is my understanding that we are supposed to use the curies package, but it isn't clear how one would add maps incrementally (see: tests/test_utils/test_curienamespace.py#113 as an example). Any solution that passes test_curienamespace.py should do.
i guess the test fails because the curienamespace catalog was already populated from test_enum
which ran before this test...
Wouild it be an option to make the catalog an instance variable instead of a class variable ? if we want to scope namespaces somehow that will be easier ? And it doesn't require passing a curieNamespaces object to each and every function call because we can still have a module-level defined singleton in generated code ?
Other way to fix it is some pre-test code that cleans up the catalog
comments from @kervel on slack:
fails tests because of global prefix catalog in the curienamespaces class. If we would change the order of the tests it would probably pass, but i think moving the catalog out of a class variable so that it becomes scoped sounds like a more robust solution. I cannot estimate the impact this would have on other uses of the prefix catalog
I think I agree but I also think I am lacking a bit of context here. It would be helpful to state here or in the initial comment what the motivation for this PR is, and a high level description of what it seeks to achieve.
i'm trying to state the motivation here:
the idea is that the python data classes, both when used to generate json and to parse json, should be able to interpret the type designator URI/CURIES in the most broad/flexible way possible. This is especially true if the range of a type designator field is "uriorcurie".
There are several uri's to designate a type: one has both the native and the assigned (via class_uri
uri's for a class), which should yield the same behaviour when used as a type designator. And then both of them can be shortened to curies using any of the prefixes in the prefix map. If a user defines multiple prefixes to the same uri namespaces (for instance because he includes yaml written by someboyd else), he would expect all known prefixes to just work.
so
The idea is for the python data classes to use the metadata already available to do this translation, rather than (like we do in pydantic) just having an exhaustive list of accepted uri's / curies as string values (which makes sense for pydantic because there we don't even depend on linkml-runtime)
so this PR adds functions that translate uries to curies and back using metadata that is in linkml runtime curie namespaces, an object which is already available in the python dataclasses right now.
hello, the reason this PR doesn't use the urieconverter from https://github.com/cthoyt/curies is laid out in the first comment here. So i created an issue there https://github.com/cthoyt/curies/issues/33 to see if that can be fixed (author replied in the mean time)
@hsolbrig https://github.com/cthoyt/curies/pull/34 . is the following okay ?
- factor out a separate class CurieNamespaceCatalog that manages the catalog as an instance variable ?
- make this class a thin wrapper around the curies package ? Do we even need a wrapper ?
then we could do
catalog = CurieNamespaceCatalog()
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/', catalog=catalog)
SHEX = CurieNamespace('shex', 'http://www.w3.org/ns/shex#', catalog=catalog)
XSD = CurieNamespace('xsd', 'http://www.w3.org/2001/XMLSchema#', catalog=catalog)
DEFAULT_ = CurieNamespace('', 'http://example.org/', catalog=catalog)
or alternatively
catalog = CurieNamespaceCatalog()
LINKML = catalog.namespace('linkml', 'https://w3id.org/linkml/')
SHEX = catalog.namespace('shex', 'http://www.w3.org/ns/shex#')
XSD = catalog.namespace('xsd', 'http://www.w3.org/2001/XMLSchema#')
DEFAULT_ = catalog.namespace('', 'http://example.org/')
@kervel @hsolbrig the new curies
v0.4.3 with incremental construction has been released to PyPI. You can upgrade and use something like the following:
import curies
converter = curies.Converter(records=[])
converter.add_prefix("hgnc", "https://bioregistry.io/hgnc:")
hello, the test done by @hsolbrig still fails when trying to convert to the new curies v0.4.3:
# Test incremental add
CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)
self.assertEqual('tester:hip_boots', namespaceCatalog.to_curie('http://fester.bester/tester#hip_boots'))
# Test multiple prefixes for same suffix
CurieNamespace('ns17', URIRef('http://fester.bester/tester#')).addTo(converter)
here, ns17 is actually a synonym. so curies refuses to take it as a new prefix. off course this could be fixed so that curies automatically converts this to a synonym for an existing Record, but that would complicate things.
this would also fail (a bit further down the same test)
# Test multiple uris for same prefix
# The following should be benign
CurieNamespace('tester', URIRef('http://fester.bester/tester#')).addTo(converter)
# Issue warnings for now on this
# TODO: test that we log the following
# 'Prefix: tester already references http://fester.bester/tester# -
# not updated to http://fester.notsogood/tester#'
CurieNamespace('tester', URIRef('http://fester.notsogood/tester#')).addTo(converter)
It's possible I could implement something like add_prefix_synonym
and add_uri_prefix_synonym
to curies.Converter
if that would be a good stop-gap before making some more fundamental improvements to the way you're doing bookkeeping
@cthoyt i'd rather first get some feedback from the people that know this code base better than i do. the originally proposed api in linkml-runtime doesn't even make a distinction between a synonym and a new prefix (as can be seen from the tests).
i created a working proposal here https://github.com/linkml/linkml-runtime/pull/251 ... i (for now) now bypass the incremental building. code is way too complicated: i need to detect synonyms and i can have transitive equality if you both have prefix synonyms and uri synonyms
not feeling very confident on this