specifications icon indicating copy to clipboard operation
specifications copied to clipboard

Add hasBioPolymerSequence type to BioChemEntity

Open AlasdairGray opened this issue 2 years ago • 19 comments

We already had a discussion (starting here) about adding hasBioPolymerSequence as a sub-property of hasRepresentation and moving inChI, inChIKey, and smiles properties to being sub-properties as well.

The outcomes of the discussion did not make it into the schema.org submission (I suspect confusion in the generation of the ttl files submitted, but it was a long time ago now).

Appropriate changes need to be made and can be done as part of #542

Tasks:

  • [x] Add hasBioPolymerSequence property to BioChemEntity
  • [ ] Define property hierarchy under hasRepresentation:
    • [ ] hasBioPolymerSequence
    • [ ] inChI
    • [ ] inChIKey
    • [ ] smiles
  • [x] Fix colour coding hasBioPolymerSequence in displays for draft profiles (should be Bioschemas not pending)
    • [x] Gene 1.1-DRAFT
    • [x] Protein 0.12-DRAFT

AlasdairGray avatar Jun 14 '22 10:06 AlasdairGray

Working on this within the DDE. Have generated the updated BioChemEntity type (v0.8-DRAFT) but it is not currently loading back into the DDE.

AlasdairGray avatar Jun 14 '22 14:06 AlasdairGray

Updated to DDE

gtsueng avatar Nov 09 '22 09:11 gtsueng

still needs to be updated to the website

gtsueng avatar Nov 09 '22 09:11 gtsueng

It's not clear to me that properties can just be nested like that. I have not seen examples of it in schema.org. If it's desirable to nest properties, it may be necessary to create a new type, 'BioChemEntityRepresentation' or something along those lines which includes the properties: hasBioPolymerSequence, inChI, inChIKey, and smiles. Then, assign this new type as the expected type for hasRepresentation. If anyone knows of an example with such a nesting in schema.org, please share.

gtsueng avatar Dec 22 '22 22:12 gtsueng

@gtsueng properties can be nested, see for instance https://schema.org/masthead (sub-property of https://schema.org/publishingPrinciples or https://schema.org/accountId (sub-property of https://schema.org/identifier) or https://schema.org/tocEntry (sub-property of https://schema.org/hasPart). A Property is indeed a child of the Intagible type in schema.org

    {
      "@id": "schema:tocEntry",
      "@type": "rdf:Property",
      "rdfs:comment": "Indicates a [[HyperTocEntry]] in a [[HyperToc]].",
      "rdfs:label": "tocEntry",
      "rdfs:subPropertyOf": {
        "@id": "schema:hasPart"
      },
      "schema:domainIncludes": {
        "@id": "schema:HyperToc"
      },
      "schema:isPartOf": {
        "@id": "https://pending.schema.org"
      },
      "schema:rangeIncludes": {
        "@id": "schema:HyperTocEntry"
      },
      "schema:source": {
        "@id": "https://github.com/schemaorg/schemaorg/issues/2766"
      }
    }

We have not proposed any property with sup-properties but this is indeed what @AlasdairGray suggested. Could you please have a second look to it? Thanks

ljgarcia avatar Mar 27 '23 16:03 ljgarcia

@ljgarcia To be clear, we want to just define these properties as nested for the sake of organizing the properties, correct? The property hierarchies in Schema.org do not appear to have any affect on the structure of their use in a Class and appear to be defined in a hierarchy just for the sake of organizing the properties. For example, https://schema.org/masthead is a subproperty of PublishingPrinciples, and is used in NewsMediaOrganization. This does not mean that NewsMediaOrganization has a property called publishingPrinciples for which a subproperty called masthead is used to store a CreativeWork object. Instead, NewsMediaOrganization just has a property called masthead for which a CreativeWork is expected--That's it.

So from a Bioschemas perspective, this would mean that the property hasBioPolymerSequence would be used in BioChemEntity without hasRepresentation, inChi (inChikey and smiles) would be used in MolecularEntity without hasRepresentation, and hasRepresentation would just be used in ChemicalSubstance, am I understanding this correctly? There would be no attempt to use inChi under hasRepresentation for ChemicalSubstance or anything like that.

gtsueng avatar Mar 27 '23 16:03 gtsueng

Yes, the idea is organizing the properties. The property hierarchy would have an impact on validation. As for the implications discussed in the last paragraph, I am not sure. Better double check with @egonw as to what makes sense for MolecularEntity and ChemicalSubstance. It might be we do not need/want the property hierarchy (not if it complicates things too much and has little/no effect).

ljgarcia avatar Mar 27 '23 17:03 ljgarcia

I guess it's not completely clear to me how the property hierarchy would affect validation since each property is used in an unnested fashion in the corresponding class/type.

For example, AudioBook has a property readBy which is a subproperty of actor. It's not like you can just use the property actor in AudioBook in lieu of readBy--that would give an error. Simlarly, Movie uses the property actor for which you cannot substitute readBy and still have it validate properly.

gtsueng avatar Mar 27 '23 19:03 gtsueng

Update on hasBioPolymerSequence property: the BioChemEntity v0.8-DRAFT type has hasBioPolymerSequence as a new property, not yet integrated in schema.org. This implies that all profiles and types inheriting from BioChemEntity class will need to be updated.

Profiles inheriting BioChemEntity class (only latest release and draft):

  • BioSample/0.1-DRAFT-2019_11_12
  • ChemicalSubstance/0.3-DRAFT-2019_11_11
  • ChemicalSubstance/0.4-RELEASE
  • Gene/1.0-RELEASE
  • Gene/1.2-DRAFT
  • MolecularEntity/0.5-RELEASE
  • MolecularEntity/0.6-DRAFT
  • Protein/0.11-RELEASE
  • Protein/0.12-DRAFT
  • ProteinAnnotation/0.6-DRAFT
  • ProteinStructure/0.6-DRAFT
  • RNA/0.2-DRAFT
  • Sample/0.1-DRAFT-2018_02_25
  • SequenceAnnotation/0.7-DRAFT
  • SequenceRange/0.2-DRAFT
  • Taxon/0.2-DRAFT-2018_09_26

Types inheriting BioChemEntity class:

  • BioChemStructure/0.1-DRAFT-2019_06_20
  • BioSample/0.1-DRAFT-2019_06_14
  • BioSample/0.1-RELEASE-2019_06_19
  • ChemicalSubstance/0.2-DRAFT-2019_06_14
  • ChemicalSubstance/0.3-RELEASE-2019_09_02
  • DNA/0.2-DRAFT-2019_06_20
  • Enzyme/0.1-DRAFT-2019_06_20
  • Gene/0.2-DRAFT-2019_06_14
  • Gene/0.3-RELEASE-2019_09_02
  • MolecularEntity/0.2-DRAFT-2019_06_14
  • MolecularEntity/0.3-RELEASE-2019_09_02
  • Protein/0.2-DRAFT-2019_06_14
  • Protein/0.3-RELEASE-2019_09_02
  • RNA/0.1-DRAFT-2019_06_21
  • SequenceAnnotation/0.1-DRAFT-2019_06_21
  • SequenceRange/0.1-DRAFT-2019_06_21

ivanmicetic avatar Apr 21 '23 13:04 ivanmicetic

Taxon is a child of Thing, not BioChemEntity.

Gene and Protein had this property long before BioChemEntity had it, so it should already be there.

ProteinAnnotation has been deprecated -- no reason to update it at this point as it's been superceded by SequenceAnnotation

Sample is pending deprecation to be superceded by BioSample

BioSample is awaiting additional changes from the working group (and potential BioHackEU2023 project)

For MolecularEntity, ChemicalSubstance (and anything else being developed by the Chemical Working Group) it is unclear if this property will be used directly or a child or parent property of this property (see discussion on nesting of properties above).

Everything else should be updated (edited by LJ) Types

  • [ ] BioChemStructure (draft)
  • [ ] BioSample (release, yes, it will be updated but that update will take time)
  • [ ] ChemicalSubstance (release)
  • [ ] DNA (draft)
  • [ ] Enzyme (draft)
  • [ ] MolecularEntity (release)
  • [ ] RNA
  • [ ] SequenceAnnotation
  • [ ] SequenceRange

Profiles:

  • [ ] ProteinStructure (draft, but, unless we know for sure that it makes sense in there and would be useful for the profile, we can leave it as it is, not all properties from BioChemEntity will make it to its corresponding profiles)

gtsueng avatar Apr 21 '23 17:04 gtsueng

@bedroesb @AlasdairGray @ivanmicetic Could you please have a look to the pending task about property colors?

  • [x] Fix colour coding hasBioPolymerSequence in displays for draft profiles (should be Bioschemas not pending)
    • [x] Gene profile (e.g., 1.2-DRAFT)
    • [x] Protein profile (e.g., 0.12-DRAFT) Thanks.

ljgarcia avatar Apr 24 '23 16:04 ljgarcia

@egonw @sneumann @gtsueng @nsjuty @oxgiraldo @albangaignard let's discuss about the proposal of having nested properties for the three (four?) identification options for MolecularEntity

  • [ ] Define property hierarchy under hasRepresentation (with Text as only range):
    • [ ] hasBioPolymerSequence
    • [ ] inChI
    • [ ] iupacName
    • [ ] inChiKey
    • [ ] smiles

The idea here (as far as I understand) would be having hasRepresentation as property for MolecularEntity as "minimum" in the profile but using either hasRepresentation or any of its children for a particular individual of type MolecularEntity.

This is an ontology/validation question. If we only specify hasRepresentation for the type MolecularEntity, can we have a MolecularEntityIndividual using instead inChiKey and expect that reasoners and validators (e.g., ShEX, SHACL) do not complain about it and gives us the expected output? The expected output in this case would be the reasoner not complaining and the validation passing.

@gtsueng already said that

For example, AudioBook has a property readBy which is a subproperty of actor. It's not like you can just use the property actor in AudioBook in lieu of readBy--that would give an error. Simlarly, Movie uses the property actor for which you cannot substitute readBy and still have it validate properly.

If @gtsueng is right, then I do not see any advantage in having nested properties.

Comments?

ljgarcia avatar Apr 24 '23 17:04 ljgarcia

Regarding:

"Fix colour coding hasBioPolymerSequence in displays for draft profiles (should be Bioschemas not pending)"

Gene and Protein types had the property hasBioPolymerSequence long before this property was included in BioChemEntity. The Gene and Protein types that are currently pending on Schema.org have these properties. So they should be colored as pending, no?

image

gtsueng avatar Apr 24 '23 20:04 gtsueng

Here's what happens when you use actor in the Audiobook example: image

and what happens when you use readBy in a Movie type: image As this is a Movie type, the property actor is automatically parsed as a Person type in the validator. The same cannot be said by the property readBy which is a subproperty of the property actor.

gtsueng avatar Apr 24 '23 21:04 gtsueng

Thanks @gtsueng for the analysis on the property hierarchy. I do not see a clear advantage in having the hierarchy. Unless @egonw @sneumann see an advantage there, I would suggest not implementing that change.

ljgarcia avatar Apr 27 '23 20:04 ljgarcia

Suggestion: drop the property hierarchy suggestion. @ivanmicetic if we drop it, anything else in this issue that it is pending?

ljgarcia avatar Jun 24 '23 10:06 ljgarcia

Comments from the 2023.06.26 community call:

  • If the hierarchy is dropped, we could use Identifier with ```PropertyValue```` as alternative for minimum property + the named property
  • SMILES is not really an identifier (actual string depends on the SMILES algorithm. Babel vs. CDK vs RDkit vs ...)
  • InChiKey (technically) possibly pointing to more than one compound (think: hash collision)

ljgarcia avatar Jun 26 '23 16:06 ljgarcia

I can see the benefit if we could tighten validation for e.g. MolecularEntity and specify Minimum for hasRepresentation, which in turn could be any of inChI/iupacName/smiles, but in @gtsueng 's comment above that doesn't work as intended through subProperties.

The new proposal is now to keep MolecularEntity.Identifier at Minimum, but require not just a text value "MIIFHRBUBUHJMC-UHFFFAOYSA-N.1" but to require a PropertyValue with value=MIIFHRBUBUHJMC-UHFFFAOYSA-N.1 and propertyID=http://semanticscience.org/resource/CHEMINF_000059 ?

sneumann avatar Jun 27 '23 10:06 sneumann

I am sorry. I have been under a DDOS attack with project deliverables. Let me check.

egonw avatar Sep 25 '23 15:09 egonw