openMINDS_core synthetic specimens - new specimen schemas?

This issue is based on a discussion at the INCF Assembly 2024 with @tgbugs:

In the days of digital twins the need arises to clearly capture "synthetic subjects". It is not sufficient to "misuse" the existing specimen or to just add a type (real vs synthetic) to the existing specimen schemas.

We discussed that in principle we need the full set of specimen schemas as well as synthetic specimen schemas with potential modifications to the properties. It is to be discussed if we need all specimens represented as synthetic or if we can just define one SyntheticSpecimen that can declare its type (subject/whole organism, single cell, etc). It is also to be discussed if a SyntheticSpecimen also has a state or if we would just register different SyntheticSpecimen for each alternation anyway.

@openMetadataInitiative/openminds-developers your thoughts?

Oct 22 '24 09:10 lzehl

A digital twin is a Model. I think the Model and ModelVersion schemas we have are sufficient for this use case, some additional properties may be necessary.

Oct 22 '24 09:10 apdavison

@apdavison thanks for responding so quickly to this. I'm not sure I agree for the following reason: I could imaging using the same computational model to create multiple digital subjects/twins because I'm just changing some parameters of the model, but not the model itself. Now one could of course argue that each change in parameter setting creates a separate model version. But at least now computational models are not registered that way (in particular not for the TVB model). If we only use Model/ModelVersion for representing digital twins this also means that each simulation for one digital twin needs to reference that Model/ModelVersion (and not the real subject under "studied specimen"). Depending on what we decide this might also just lead to a separation between simulated and experimental datasets.

I'd like to have this as one of our TODOs for the next year to solve this more accurately. And we definitely need to discuss a good solution in the larger round. @openMetadataInitiative/openminds-developers (or others) please provide more thoughts here so that we already have a base for the further discussion.

Oct 22 '24 15:10 lzehl

at least now computational models are not registered that way

For EBRAINS they are supposed to be registered that way, with the "input_data" property used to contain the parameter settings, and they are treated that way in the Model Catalog app. If there are cases that are not registered like this, there has been a mistake or oversight in curation.

For digital twins, my suggestion would be to add a "represents" or "isRepresentationOf" property to ModelVersion, which would point to the real subject.

I agree that this results in a difference of interpretation of Model/ModelVersion with respect to other research products, such as Software/SoftwareVersion.

An alternative, perhaps cleaner, approach would be to add a ModelInstance (or ModelInstantiation) schema, which perhaps would be very similar to SyntheticSpecimen, but I'm not sure this is really needed.

Oct 22 '24 15:10 apdavison

@tgbugs just to pin you on this issue for further discussions since you are facing the same issue

May 27 '25 16:05 lzehl

Thanks. A brief note for now that we have "virtual subjects" (I'm not sure we've done virtual samples, but they are possible) that are more or less the parameterization of a core simulation based on data from a specific known biological subject.

May 27 '25 16:05 tgbugs