biolink-model
biolink-model copied to clipboard
Prediction Qualifiers
Question:
We are looking to include predictive associations in a knowledge graph using the BioLink model. Are there currently qualifiers to specify is_predicted
with a boolean value and/or predicted_by_model_type
with some model (e.g. Enformer, AlphaMissense) or is there a recommended way to do so?
If not, is this within the scope of BioLink and something we can work to add, or would it be recommended to extend it independently?
Hi @riyavsinha - nice to hear from you! Thank you for the question. Yes, we just released Biolink 4.2.0 with some guidance in adding two edge properties, knowledge level
and agent type
to help capture the nature of the edge (whether it be a prediction, an assertion, or a statistical calculation).
Details and guidance for assigning ‘At-a-Glance’ provenance properties that allow users to make a first-pass assessment of the strength, relevance, and utility of a given Edge or Result.
Enumerated values for agent type
are described in Biolink via the range of the property and include:
- manual_agent
- automated_agent
- data_analysis_pipeline
- computational_model
- text_mining_agent
- image_processing
- agentmanual_validation_of_automated_agent
- not_provided
and for ‘knowledge_level’ (which describes the level or type of statement that is reported in an edge, based on the reasoning or analysis methods used to generate the knowledge it reports, or the type/strength of evidence supporting this knowledge), enumerated values include:
- knowledge_assertion
- logical_entailment
- prediction
- statistical_association
- observation
- not_provided
The main challenge in applying this standard concerns selecting appropriate agent type and knowledge level terms for a given edge. Separation of agent type and knowledge level into separate properties is intended to make it easier to identify and apply the most appropriate terms for each of these provenance characteristics.
https://biolink.github.io/biolink-model/agent_type/ https://biolink.github.io/biolink-model/AgentTypeEnum/ https://biolink.github.io/biolink-model/knowledge_level/ https://biolink.github.io/biolink-model/KnowledgeLevelEnum/
Some additional guidance:
- If a human participated in the reasoning and interpretation activities that led to creation of the knowledge statement, select ‘manual agent’
- If a human participated only by vetting/validating a knowledge statement that was generated by an automated agent, select ‘manual validation of automated agent’.
- It is important to indicate when such manual review has occurred, because it can give a user more confidence in an automated statement.
- If a human was involved only in writing code/algorithms that were executed to process, analyze, or reason with data, but the knowledge statement itself was generated by software without direct human intervention, select ‘automated agent’ (or one of its children)
- If an automated agent generating a knowledge statement executes a set of data processing and analysis tasks, and then reports the direct result of this analysis - but does NOT perform reasoning or inference to draw a broader conclusion based on these results - select ‘data analysis pipeline’
- Data Analysis Pipelines summarize features of a dataset, or report statistical associations/enrichments within the data.
- These agents typically generate Statements that report an ‘association’ or ‘correlation’ between variables in the dataset (e.g. ‘PM2.5 exposure is positively correlated with ER visits for Asthma, in cohort/dataset X’), or a statistical enrichment of concepts in a dataset (e.g. “Gene Set X is enriched in Pathway Y”)
- If the automated agent performs any form of reasoning or inference over the data/information it consumes, and performs reasoning/inference over this information to draw a broader conclusion about the domain of discourse, select ‘computational model’.
With regards to specifying a specific kind of model in the edge metadata as well; if you would like to provide a list of methods, we can better help sort out which additional biolink property best holds those?
Thank you for the detailed response, that is really helpful to know, and great that BioLink supports that!
if you would like to provide a list of methods, we can better help sort out which additional biolink property best holds those?
For this, we haven't established a set list of methods yet, but in general, could be things like Enformer, AlphaMissense, Activity-by-Contact (ABC) models, ChromBPNet models, etc.
It seems like the Agent
entity has a string provided_by
that this string information can go in, but I'm not clear where that could be linked to in an Association?