specifications icon indicating copy to clipboard operation
specifications copied to clipboard

Discussion: How to describe the intended ML task of a training Dataset?

Open ljgarcia opened this issue 2 years ago • 0 comments

The ELIXIR Machine Learning Focus Group (including the task force on synthetic data) and NFDI4DataScience (and possible RDA FAIR4ML IG) are interested in using metadata to describe the distribution of a dataset for ML training purposes (including the DOME recommendations for Data).

Please let us know your thoughts on the following properties to describe the ML task it could be used for, probably combined with an EDAM term for Operations

  • usageInfo usageInfo: "http://edamontology.org/operation_3482" or
  • potentialAction (either with an Action or a DefinedTerm
  • or via a link to the software (e.g., ComputationalTool) and leaving the matter of the intended ML task to the software used to create the dataset

The cons for the properties mentioned would be the lack of support of DefinedTerm. A discussion about extending support for DefinedTerm in schema.org is ongoing

ljgarcia avatar Jan 30 '23 13:01 ljgarcia