update HACCP_term regex to required FOODON, add multivalue example
Address syntax match to examples Update regexs for MIxS
Based on the description for HACCP this requires the FOODON ontology. description: Hazard Analysis Critical Control Points (HACCP) food safety terms; This field accepts terms listed under HACCP guide food safety term (http://purl.obolibrary.org/obo/FOODON_03530221)
While this doesn't perform any validation to check if what's been entered is really in FOODON, it does some string check.
I didn't include an example. I am not at all familiar with the FoodAnimalAndAnimalFeed extension. Before I committed time to getting familair and making an example, I wanted to check that this was a good change.
Thanks @mslarae13. This is good progress. We can refine it a little:
First of all, how long are the numeric portions of FOODON URIs?
I used ChatGPT 4 to help me with that SAPRQL query
7 or 8, after subtracting the 38 characters in the base portion or the URIs, "http://purl.obolibrary.org/obo/FOODON_"
Next I asked ChatGPT 4
I want to write a regular expression for a FOODON label followed by one white-space and then a FOODON CURIe. The CURIes should be enclosed in square brackets. They start with "FOODON:" and are followed by 7 or 8 digits. The label must start with a non-white-space character but can have any number of any characters after that, as long as they aren't carriage returns, line feeds, etc.
after a little testing with regexr, we came up with
^(\S[^\r\n]*) [FOODON:\d{7,8}]$
I f we want to use pattern-only validation, I suggest we go with that.
That doesn't check that the label and id portion match, etc., and it doesn't limit the choices to sub-classes of haccp guide food safety term
A better LinkML validation strategy for this might be a dynamic enumeration. They are expressed with logic, but can be expanded to an enumeration with explicit permissible values. A limitation right now is that be that the permissible values won't include the label and the id won't be enclosed in square brackets. But I would like to use this case to motivate improvements to LinkML dynamic enumerations in support of MIxS.
The vskit command from the Ontology Access Kit can be used like this
vskit expand -s schema.yaml -o schema_expanded.yaml
to expand this
enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOf
into this
enums:
HaccpTerm:
reachable_from:
source_ontology: bioregistry:foodon
source_nodes:
- FOODON:03530221 ## haccp guide food safety term
is_direct: false
relationship_types:
- rdfs:subClassOf
permissible_values:
FOODON:03530231:
text: FOODON:03530231
meaning: FOODON:03530231
title: hazard 3
FOODON:03530244:
text: FOODON:03530244
meaning: FOODON:03530244
title: sodium tripolyphosphate
FOODON:03530237:
text: FOODON:03530237
meaning: FOODON:03530237
title: hazard 9
If using this mechanism sounds promising to you, and you want the OAK code to be modified to emit "sodium tripolyphosphate [FOODON:03530244]" instead of "FOODON:03530244", please up-vote this
- https://github.com/INCATools/ontology-access-kit/issues/622
I agree that the change is suitable. As for the actual patturn being used, I bow to @turbomam's greater expertise on that! The additional idea of using some sort of automated expansion thingy sounds like a good idea to me, so I have thumbs-up'd that ticket in the ontology access toolkit repo.