specifications icon indicating copy to clipboard operation
specifications copied to clipboard

Discussion: how to describe the nature of the content for Datasets?

Open ljgarcia opened this issue 1 year ago • 4 comments

Some options that were mentioned on emails and community calls:

What way would the community want to go? Please add your thoughts, pros and cons to help us find a community-based approach

ljgarcia avatar Jan 24 '23 17:01 ljgarcia

I would go for about as its range Thing would make it possible to use Bioschemas types such as Taxon while also possible to use DefineTerms coming, for instance, from EDAM.

RO-Crate also use "about" (in use) and “keywords” like that. https://www.researchobject.org/ro-crate/1.1/contextual-entities.html#subjects--keywords

ljgarcia avatar Jan 24 '23 17:01 ljgarcia

@ljgarcia Thanks for opening the discussion.

In our case the repository provides heterogeneous datasets, focused on plant research data, but without a specific data domain focus, because the aim was to provide a generic platform to share datasets, which are too large or not in the scope of existing database. We have genomic data, phenotypic images, metabolomics dataset, microscopy pictures, software and so on. That is why the general specification is “dataset”, but of course, all are related to plants and can therefore described with a "taxon".

I think I would prefer the solution to add the taxon content in the "about" section, because it looks more clear and the "keywords section is already used for the general dataset description.

Here is an example:

<script type="application/ld+json">{

  "@context":"http://schema.org/",
  "@type":"Dataset",
  "http://purl.org/dc/terms/conformsTo":"https://bioschemas.org/profiles/Dataset/1.0-RELEASE",
  "@id":"https://doi.ipk-gatersleben.de/DOI/b2f47dfb-47ff-4114-89ae-bad8dcc515a1/7eb2707b-d447-425c-be7a-fe3f1fae67cb/2",
  "keywords":"barley, Hordeum vulgare, genome sequence assembly, long read sequencing, gene annotation, transposable elements",
  "about":  {
    "@type":"Taxon",
    "@id":"http://purl.bioontology.org/ontology/NCBITAXON/4513",
    "http://purl.org/dc/terms/conformsTo":"https://bioschemas.org/profiles/Taxon/0.6-RELEASE",
    "url":"https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=info&id=4513",  
    "taxonRank":"species",
    "parentTaxon":"Hordeum",
    "http://rs.tdwg.org/dwc/terms/vernacularName":"barley"
  }
  #The rest of the properties describing this dataset
}
</script>

arendd avatar Jan 26 '23 09:01 arendd

Hi, I also totally agree with the "about" option, the illustration given by @arendd is very convincing that this is very appropriate.

frmichel avatar Jan 26 '23 15:01 frmichel

@gtsueng this discussion is useful also for the "topic" and "organism" elements needed in the synthetic datasets. We could use about to describe the topic/subject of the Dataset (including the organism), see also https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/subject.

ljgarcia avatar Jan 30 '23 13:01 ljgarcia