specifications
specifications copied to clipboard
Add hasBioPolymerSequence type to BioChemEntity
We already had a discussion (starting here) about adding hasBioPolymerSequence
as a sub-property of hasRepresentation
and moving inChI
, inChIKey
, and smiles
properties to being sub-properties as well.
The outcomes of the discussion did not make it into the schema.org submission (I suspect confusion in the generation of the ttl files submitted, but it was a long time ago now).
Appropriate changes need to be made and can be done as part of #542
Tasks:
- [x] Add
hasBioPolymerSequence
property toBioChemEntity
- [ ] Define property hierarchy under
hasRepresentation
:- [ ]
hasBioPolymerSequence
- [ ]
inChI
- [ ]
inChIKey
- [ ]
smiles
- [ ]
- [x] Fix colour coding
hasBioPolymerSequence
in displays for draft profiles (should be Bioschemas not pending)- [x] Gene 1.1-DRAFT
- [x] Protein 0.12-DRAFT
Working on this within the DDE. Have generated the updated BioChemEntity type (v0.8-DRAFT) but it is not currently loading back into the DDE.
Updated to DDE
still needs to be updated to the website
It's not clear to me that properties can just be nested like that. I have not seen examples of it in schema.org. If it's desirable to nest properties, it may be necessary to create a new type, 'BioChemEntityRepresentation' or something along those lines which includes the properties: hasBioPolymerSequence
, inChI
, inChIKey
, and smiles
. Then, assign this new type as the expected type for hasRepresentation
. If anyone knows of an example with such a nesting in schema.org, please share.
@gtsueng properties can be nested, see for instance https://schema.org/masthead (sub-property of https://schema.org/publishingPrinciples or https://schema.org/accountId (sub-property of https://schema.org/identifier) or https://schema.org/tocEntry (sub-property of https://schema.org/hasPart). A Property is indeed a child of the Intagible type in schema.org
{
"@id": "schema:tocEntry",
"@type": "rdf:Property",
"rdfs:comment": "Indicates a [[HyperTocEntry]] in a [[HyperToc]].",
"rdfs:label": "tocEntry",
"rdfs:subPropertyOf": {
"@id": "schema:hasPart"
},
"schema:domainIncludes": {
"@id": "schema:HyperToc"
},
"schema:isPartOf": {
"@id": "https://pending.schema.org"
},
"schema:rangeIncludes": {
"@id": "schema:HyperTocEntry"
},
"schema:source": {
"@id": "https://github.com/schemaorg/schemaorg/issues/2766"
}
}
We have not proposed any property with sup-properties but this is indeed what @AlasdairGray suggested. Could you please have a second look to it? Thanks
@ljgarcia To be clear, we want to just define these properties as nested for the sake of organizing the properties, correct? The property hierarchies in Schema.org do not appear to have any affect on the structure of their use in a Class and appear to be defined in a hierarchy just for the sake of organizing the properties. For example, https://schema.org/masthead is a subproperty of PublishingPrinciples
, and is used in NewsMediaOrganization
. This does not mean that NewsMediaOrganization
has a property called publishingPrinciples
for which a subproperty called masthead
is used to store a CreativeWork
object. Instead, NewsMediaOrganization
just has a property called masthead
for which a CreativeWork
is expected--That's it.
So from a Bioschemas perspective, this would mean that the property hasBioPolymerSequence
would be used in BioChemEntity
without hasRepresentation
, inChi
(inChikey
and smiles
) would be used in MolecularEntity
without hasRepresentation
, and hasRepresentation
would just be used in ChemicalSubstance
, am I understanding this correctly? There would be no attempt to use inChi
under hasRepresentation
for ChemicalSubstance or anything like that.
Yes, the idea is organizing the properties. The property hierarchy would have an impact on validation. As for the implications discussed in the last paragraph, I am not sure. Better double check with @egonw as to what makes sense for MolecularEntity and ChemicalSubstance. It might be we do not need/want the property hierarchy (not if it complicates things too much and has little/no effect).
I guess it's not completely clear to me how the property hierarchy would affect validation since each property is used in an unnested fashion in the corresponding class/type.
For example, AudioBook has a property readBy
which is a subproperty of actor
. It's not like you can just use the property actor
in AudioBook in lieu of readBy
--that would give an error. Simlarly, Movie uses the property actor
for which you cannot substitute readBy
and still have it validate properly.
Update on hasBioPolymerSequence
property: the BioChemEntity v0.8-DRAFT type has hasBioPolymerSequence
as a new property, not yet integrated in schema.org. This implies that all profiles and types inheriting from BioChemEntity
class will need to be updated.
Profiles inheriting BioChemEntity
class (only latest release and draft):
- BioSample/0.1-DRAFT-2019_11_12
- ChemicalSubstance/0.3-DRAFT-2019_11_11
- ChemicalSubstance/0.4-RELEASE
- Gene/1.0-RELEASE
- Gene/1.2-DRAFT
- MolecularEntity/0.5-RELEASE
- MolecularEntity/0.6-DRAFT
- Protein/0.11-RELEASE
- Protein/0.12-DRAFT
- ProteinAnnotation/0.6-DRAFT
- ProteinStructure/0.6-DRAFT
- RNA/0.2-DRAFT
- Sample/0.1-DRAFT-2018_02_25
- SequenceAnnotation/0.7-DRAFT
- SequenceRange/0.2-DRAFT
- Taxon/0.2-DRAFT-2018_09_26
Types inheriting BioChemEntity
class:
- BioChemStructure/0.1-DRAFT-2019_06_20
- BioSample/0.1-DRAFT-2019_06_14
- BioSample/0.1-RELEASE-2019_06_19
- ChemicalSubstance/0.2-DRAFT-2019_06_14
- ChemicalSubstance/0.3-RELEASE-2019_09_02
- DNA/0.2-DRAFT-2019_06_20
- Enzyme/0.1-DRAFT-2019_06_20
- Gene/0.2-DRAFT-2019_06_14
- Gene/0.3-RELEASE-2019_09_02
- MolecularEntity/0.2-DRAFT-2019_06_14
- MolecularEntity/0.3-RELEASE-2019_09_02
- Protein/0.2-DRAFT-2019_06_14
- Protein/0.3-RELEASE-2019_09_02
- RNA/0.1-DRAFT-2019_06_21
- SequenceAnnotation/0.1-DRAFT-2019_06_21
- SequenceRange/0.1-DRAFT-2019_06_21
Taxon is a child of Thing, not BioChemEntity.
Gene and Protein had this property long before BioChemEntity had it, so it should already be there.
ProteinAnnotation has been deprecated -- no reason to update it at this point as it's been superceded by SequenceAnnotation
Sample is pending deprecation to be superceded by BioSample
BioSample is awaiting additional changes from the working group (and potential BioHackEU2023 project)
For MolecularEntity, ChemicalSubstance (and anything else being developed by the Chemical Working Group) it is unclear if this property will be used directly or a child or parent property of this property (see discussion on nesting of properties above).
Everything else should be updated (edited by LJ) Types
- [ ] BioChemStructure (draft)
- [ ] BioSample (release, yes, it will be updated but that update will take time)
- [ ] ChemicalSubstance (release)
- [ ] DNA (draft)
- [ ] Enzyme (draft)
- [ ] MolecularEntity (release)
- [ ] RNA
- [ ] SequenceAnnotation
- [ ] SequenceRange
Profiles:
- [ ] ProteinStructure (draft, but, unless we know for sure that it makes sense in there and would be useful for the profile, we can leave it as it is, not all properties from BioChemEntity will make it to its corresponding profiles)
@bedroesb @AlasdairGray @ivanmicetic Could you please have a look to the pending task about property colors?
- [x] Fix colour coding
hasBioPolymerSequence
in displays for draft profiles (should be Bioschemas not pending)- [x] Gene profile (e.g., 1.2-DRAFT)
- [x] Protein profile (e.g., 0.12-DRAFT) Thanks.
@egonw @sneumann @gtsueng @nsjuty @oxgiraldo @albangaignard let's discuss about the proposal of having nested properties for the three (four?) identification options for MolecularEntity
- [ ] Define property hierarchy under
hasRepresentation
(withText
as only range):- [ ]
hasBioPolymerSequence
- [ ]
inChI
- [ ]
iupacName
- [ ]
inChiKey
- [ ]
smiles
- [ ]
The idea here (as far as I understand) would be having hasRepresentation
as property for MolecularEntity
as "minimum" in the profile but using either hasRepresentation
or any of its children for a particular individual of type MolecularEntity
.
This is an ontology/validation question. If we only specify hasRepresentation
for the type MolecularEntity
, can we have a MolecularEntityIndividual
using instead inChiKey
and expect that reasoners and validators (e.g., ShEX, SHACL) do not complain about it and gives us the expected output? The expected output in this case would be the reasoner not complaining and the validation passing.
@gtsueng already said that
For example, AudioBook has a property readBy which is a subproperty of actor. It's not like you can just use the property actor in AudioBook in lieu of readBy--that would give an error. Simlarly, Movie uses the property actor for which you cannot substitute readBy and still have it validate properly.
If @gtsueng is right, then I do not see any advantage in having nested properties.
Comments?
Regarding:
"Fix colour coding hasBioPolymerSequence in displays for draft profiles (should be Bioschemas not pending)"
Gene and Protein types had the property hasBioPolymerSequence
long before this property was included in BioChemEntity. The Gene and Protein types that are currently pending on Schema.org have these properties. So they should be colored as pending, no?
Here's what happens when you use actor
in the Audiobook example:
and what happens when you use readBy
in a Movie type:
As this is a Movie type, the property
actor
is automatically parsed as a Person type in the validator. The same cannot be said by the property readBy
which is a subproperty of the property actor
.
Thanks @gtsueng for the analysis on the property hierarchy. I do not see a clear advantage in having the hierarchy. Unless @egonw @sneumann see an advantage there, I would suggest not implementing that change.
Suggestion: drop the property hierarchy suggestion. @ivanmicetic if we drop it, anything else in this issue that it is pending?
Comments from the 2023.06.26 community call:
- If the hierarchy is dropped, we could use
Identifier
with ```PropertyValue```` as alternative for minimum property + the named property - SMILES is not really an identifier (actual string depends on the SMILES algorithm. Babel vs. CDK vs RDkit vs ...)
- InChiKey (technically) possibly pointing to more than one compound (think: hash collision)
I can see the benefit if we could tighten validation for e.g. MolecularEntity and specify Minimum
for hasRepresentation
, which in turn could be any of inChI/iupacName/smiles, but in @gtsueng 's comment above that doesn't work as intended through subProperties
.
The new proposal is now to keep MolecularEntity.Identifier
at Minimum
, but require not just a text value "MIIFHRBUBUHJMC-UHFFFAOYSA-N.1" but to require a PropertyValue with value=MIIFHRBUBUHJMC-UHFFFAOYSA-N.1
and propertyID=http://semanticscience.org/resource/CHEMINF_000059
?
I am sorry. I have been under a DDOS attack with project deliverables. Let me check.