biolink-model icon indicating copy to clipboard operation
biolink-model copied to clipboard

Schema and generated objects for biolink data model and upper ontology

Biolink Model Python 3.7 Build Status DOI Join the chat at https://gitter.im/biolink-model/community Regenerate Biolink Model Artifacts Deploy Documentation

Biolink Model

Quickstart docs:

For a good overview of the biolink-model, watch Chris Mungall's talk at ICBO 2020.

Refer to the following resources for a quick introduction to the Biolink Model:

See also Biolink Model Guidelines for help understanding, curating, and working with the model.

Introduction

The purpose of the Biolink Model is to provide a high-level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc), their properties, relationships, and enumerate ways in which they can be associated.

The representation is independent of storage technology or metamodel (Solr documents, neo4j/property graphs, RDF/OWL, JSON, CSVs, etc). Different mappings to each of these are provided.

The specification of the Biolink Model is a single YAML file built using linkml. The basic elements of the YAML are:

  • Class Definitions: definitions of upper level classes representing both 'named thing' and 'association'
  • Slot Definitions: definitions of slots (aka properties) that can be used to relate members of these classes to other classes or data types. Slots collectively refer to predicates, node properties, and edge properties

The model itself is being used in the following projects:

Organization

The main source of truth is biolink-model.yaml. This is a YAML file that is intended to be relatively simple to view and edit in its native form.

The yaml definition is currently used to derive:

  • JSON Schema
  • Python dataclasses
  • Java code gen
  • ProtoBuf definitions
  • GraphQL
  • RDF
  • OWL
  • RDF Shape Expressions
  • JSON-LD context
  • Graphviz
  • GOlr YAML schemas
    • these can be compiled down to Solr XML schemas
    • these are also intermediate targets used within the BBOP/AmiGO framework
  • Markdown documentation

Make and build instructions

Prerequisites: Python 3.7+ and pipenv

To install pipenv,

pip3 install pipenv

To install the project,

make install

To regenerate artifacts from the Biolink Model YAML,

make

Note: the Makefile requires the following dependencies to be installed:

jsonschema

jsonschema

Generally install using

pip3 install jsonschema

jsonschema2pojo

jsonschema2pojo

If you are on a Mac, it can be installed using brew:

brew install jsonschema2pojo

For other OS environments, download the latest release then extract it into your execution path. eg

wget https://github.com/joelittlejohn/jsonschema2pojo/releases/download/jsonschema2pojo-1.0.2/jsonschema2pojo-1.0.2.tar.gz
tar -xvzf jsonschema2pojo-1.0.2.tar.gz
export PATH=$PATH:`pwd`/jsonschema2pojo-1.0.2/bin

GraphViz

See GraphViz site for installation in your operating system.

How do I use Biolink Model YAML programatically?

For operations such as CURIE lookup, finding class by synonyms, get parents, get ancestors, etc. please make use of biolink-model-toolkit. It provides convenience methods for traversing Biolink Model.

Citing Biolink Model

Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, The Biomedical Data Translator Consortium (2022). Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci. Wiley; 2022 Jun 6; https://onlinelibrary.wiley.com/doi/10.1111/cts.13302