odis-arch icon indicating copy to clipboard operation
odis-arch copied to clipboard

TURTLE+ODIS - initialise digital twin interoperability

Open pbuttigieg opened this issue 3 years ago • 28 comments

In the framework of TURTLE - a project under the UN Ocean Decade DITTO Programme - we'll use this issue to scope out and execute a "hello world" interoperability exercise between at least two (more welcome) digital twins.

We'll attempt to shape / modify some generic ODIS Arch patterns for compute resources, model stacks, etc, and also some specific data exchanges around biodiversity and bathymetry (@pieterprovoost will almost certainly involve OBIS data)

The first objective is to list which twins we'll be testing with and what (meta)data we'll be exchanging ...

pbuttigieg avatar Jan 16 '23 16:01 pbuttigieg

@justinbuck

justinbuck avatar Jan 16 '23 16:01 justinbuck

test!

Naisunev avatar Jan 16 '23 16:01 Naisunev

Thanks @justinbuck and @Naisunev we can start listing particulars on which twins you're bringing in, where their endpoints are, and brainstorming what we'd like to exchange.

We can schedule a meeting with @fils and @jmckenna to set up a workflow to link your catalogues via ODIS, as the first step towards exchanging data, containers, code, or other digital assets

pbuttigieg avatar Jan 16 '23 16:01 pbuttigieg

@justinbuck @Naisunev

Shall we start with defining metadata for the core modules of DTs ?

These would likely include:

  • data ingest and ETL / integration
  • observing / sensing data streams
  • model stack and I/O
  • virtualisation / visualisation
  • the "what if" modules (hypothetical twin of a twin)
  • data export

pbuttigieg avatar Jan 28 '23 09:01 pbuttigieg

Great! working out how we can do working mode? Also: great to use ISO 23247. I think this will greatly help breakdown of functionality. If needed I have some images.

Naisunev avatar Jan 30 '23 13:01 Naisunev

Hmm. Is that the modules themselves or the data from it? I'm in :-) Had to retrieve my password.

UteBroenner avatar Mar 02 '23 17:03 UteBroenner

Hi, OGC here and Iliad interoperability support. Afaik 23247 is good start, but images/schema could be helpful at least here as we're naturally more fluent in ISO 19xxx suite then 23xxx. We're quite focused on these:

  • data ingest and ETL / integration
  • observing / sensing data streams
  • data export both on the catalog (meta)data and data itself.

plus for:

  • the "what if" modules (hypothetical twin of a twin) we use Processing Services and Application Package in Iliad, stack is supported already by some DIASes and Agencies like ESA and DLR.

not sure about:

  • model stack and I/O
  • virtualisation / visualisation as this is more infrastructure area, we use docker on default.

is the plan @UteBroenner to propose your pilot as the guinea pig?

pzaborowski avatar Mar 02 '23 21:03 pzaborowski

@pzaborowski @pbuttigieg hi, I have had some more discussions with Rob regarding sync API and how we can work this into OGC (https://github.com/mimiro-io/ocean-open-data-sync-protocol/blob/master/specification.md) and also working with HUB ocean to set up some example integrations. How can we align these things best?

gra-moore avatar Mar 03 '23 06:03 gra-moore

Sure, we can do that! Happy to be the guinea pig. UteSent from mobile device, please excuse brevity. Am 02.03.2023 um 22:40 schrieb Piotr Zaborowski @.***>: Hi, OGC here and Iliad interoperability support. Afaik 23247 is good start, but images/schema could be helpful at least here as we're naturally more fluent in ISO 19xxx suite then 23xxx. We're quite focused on these:

data ingest and ETL / integration observing / sensing data streams data export both on the catalog (meta)data and data itself.

plus for:

the "what if" modules (hypothetical twin of a twin) we use Processing Services and Application Package in Iliad, stack is supported already by some DIASes and Agencies like ESA and DLR.

not sure about:

model stack and I/O virtualisation / visualisation as this is more infrastructure area, we use docker on default.

is the plan @UteBroenner to propose your pilot as the guinea pig?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

UteBroenner avatar Mar 04 '23 21:03 UteBroenner

@gra-moore I'm not leaing this, but would suggest - if herein scenario includes the synchronization case - to refer to your work. Then if needed focus on testing, profiling, season integration API and in the original repo eventually generalise changes if we standardise it alone. WDYT?

pzaborowski avatar Mar 05 '23 21:03 pzaborowski

Thanks all for the input - note the iFDO for DTs issue linked above too.

The path ahead is to see how to splice the specifications noted above into JSON-LD / schema.org exchange packets and get some twins talking to each other and generically visible via ODIS

@UteBroenner the patterns can cover (meta)data, which includes descriptions of software modules

pbuttigieg avatar Mar 14 '23 07:03 pbuttigieg

@pbuttigieg @Naisunev @pzaborowski @gra-moore @justinbuck I would like to revive this towards something demonstrable for the DITTO Summit in November. In order to contribute I would need some concrete to work on. Should we arrange for a short meeting or discuss asynchronously here?

UteBroenner avatar May 22 '23 07:05 UteBroenner

@UteBroenner if we could have a quick meeting to sync up that would be great.

gra-moore avatar May 22 '23 08:05 gra-moore

@gra-moore we are meeting tomorrow at 11, alternatively at 9 for half an hour. Would you send me your email so I can invite you to the meeting?

UteBroenner avatar May 25 '23 13:05 UteBroenner

@UteBroenner I have sent you an email with my contact details. I am available at 11.

gra-moore avatar May 26 '23 07:05 gra-moore

Great meeting today: Document for dumping current ideas & developments at https://docs.google.com/document/d/1fKa3A5g82Y6OICBg0lHtAyeyJI-GcWdBV8BVdCd1bSY/edit#

UteBroenner avatar May 26 '23 11:05 UteBroenner

Not that we have a dedicated repo under DITTO now: https://github.com/DITTO-OceanDecade/turtle

UteBroenner avatar Aug 10 '23 09:08 UteBroenner

Not that we have a dedicated repo under DITTO now: https://github.com/DITTO-OceanDecade/turtle

thanks @UteBroenner I am now following that repo

jmckenna avatar Aug 10 '23 18:08 jmckenna

Following yesterday's meeting, we'll be drafting a meta-pattern that should allow federated digital twins to leverage ODIS (and other ODIS-like systems, using schema.org and JSON-LD) to query and explore each other's asset catalogues.

This work also applies to infrastructures preparing for digital twin interoperability, such as EDITO-Infra

The basic meta-pattern

Event cascades

Digital twins are - by definition - concerned with tracking a real-world entity, meshing sensed/observed data streams with modelling outputs to create a dense enough representation to power virtualisation engines that feed user experiences.

When a "what-if" scenario is triggered, this initiates another event cascade that runs counter-current to that described above. As we're not tracking the real-world entity anymore, what is actually happening is the instantiation of a digital twin for a hypothetical/not-really-real entity, determined by user settings/inputs. This user input event triggers the spoofing of (some of) the data from the observation/sensing and/or modelling stack.

As such, the ODIS elements we can use to help the federated twins describe their cascades are:

  • schema:Event - alongside the sub/super-event propeties should be used to describe the sensing events, as well as any computational / analysis / modelling events that result from it.
  • schema:Action - and the potentialAction type should be used closely with Event, to describe the finer-grained actions that happen in a twin's digital ecosystem (e.g. ingest, model runs, visualisation,...).
  • schema:Dataset - Naturally, datasets will be transmitted between each Event and Action: these are not just data from sensors, they can also include things like paramters, settings, instructures, etc. Software code is a special case, and has its own type. The schema:Datacatalog type may be useful here too.
  • schema:SoftwareApplication - This type will be key to describe the software modules that each twin has. Each application (containerised or otherwise) should be described with this type. This way, digital twins can see what other twins have. An Event or Action can trigger a software application as an agent, and this can generate a Dataset or SoftwareCode as an output, that is the input for the next Event/Action etc.
  • schema:Service - and its WebService sub-type can be used to describe processes that are running (based on the execution of a SoftwareApplication). This type should be used to catalog what services a given twin is running or capable of running (metadata in the type can articulate that).
  • schema:HowTo - We haven't really used this type much in ODIS, but this may be what we need to provide the skeleton for workflow description, which can be instantiated by the execution of an Action or Event.

There are certainly more, but this collection will get us quite far in creating graphs that describe how any given twin handles the cascades described above. These graphs - linking the elements above - are also likely to map quite closely to the CWL and other workflow representations in use, providing a generic interoperability layer which can leverage ODIS to interface with the holdings of the broader federation.

Next steps

The ODIS team will prep these types in our documentation (some are already there), but TURTLE members can already start experimenting.

Our next meeting is at the end of February, where we can review the specifications and approach.

@justinbuck - this is a general summary of the meeting, I think BODC's ODIS set up could be the place to start with examples, as you already created a flow from Events to Datasets.

pbuttigieg avatar Feb 02 '24 12:02 pbuttigieg

It's fine to use a few concepts from schema.org as high level classifiers, but architecturally you will need to use relevant specific domain models for interoperable content. It seems short-sighted not to plan for this now, and treat schema.org as simply one of many such classification layers that sit over machine actionable descriptions.

rob-metalinkage avatar Feb 16 '24 06:02 rob-metalinkage

Which domain resources do you have in mind ? The high-level stuff gets us quite close to component exchange

It seems short-sighted not to plan for this now, and treat schema.org as simply one of many such classification layers that sit over machine actionable descriptions.

We have planned for this. As discussed in our TURTLE meeting (I believe you were there), nesting other descriptions with greater expressivity and domain-relevance within schema.org is easily done. The generic schema.org shell, supports broad discovery, while nested content can be created as each twin wishes based on their target audiences and internal needs. See: https://doi.org/10.5281/zenodo.7682399

pbuttigieg avatar Feb 27 '24 15:02 pbuttigieg

Following on from https://github.com/iodepo/odis-arch/issues/162#issuecomment-1923674608

All reference patterns for JSON-LD/schema.org noted above are here: https://github.com/iodepo/odis-in/tree/master/dataGraphs/thematics

We'll build documentation around those in the ODIS Book after some more testing.

Direct links:

You'll notice that the value space of many properties is described like:


"audience": {"@type": "https://schema.org/Audience"},

That just means that one should refer to the noted schema.org type specification(s) for guidance on how to build that stanza. For example


"audience": {
  "@type": "Audience",
  "audienceType": "Technical experts associated with digital twin interoperability efforts",
  "geographicArea": {
    "@type": "AdministrativeArea",
    "name": "global"
  },
  "description": "Members of the the 'Interoperability Architecture for a Digital Ocean' (TURTLE) project. TURTLE's goal is to coordinate ongoing international Digital Twins of the Ocean projects and work towards an interoperability architecture. As initiatives around the globe begin to enhance ocean-oriented digital capacity, there are unprecedented opportunities to power digital twinning."
}

One can use more properties, or get more detailed using @Place types with geospatial coordinates, of course.

Examples of most of the types above are available from the ODIS Federation partners (with varying levels of completeness), and discoverable through https://oceaninfohub.org/. For example, this dataset JSON record from naturalscience.be or these time series EventSeries from the METS RCN project.

pbuttigieg avatar Feb 29 '24 14:02 pbuttigieg

Hello @pbuttigieg , sorry for my late input! As promise, below are the links to the EDITO catalogs.

Metadata

Links: Data catalog (STAC), GUI is here. The catalog is variable-oriented. We use STAC collections for variables, and STAC items for any piece of homogeneous data over a given variable. STAC assets points to either actual data files or API endpoints to retrieve the data. STAC catalogs are used to present alternative views for the users to discover and browse the data. Currently, we use the Resto engine, but we can implement the STAC browser mapping to schema.org for example.

Services

EDITO has two kind of services: the premanent service that run the platform (a catalog engine, a viewer, a tutorial platform, etc.) and services that the users can launch on their names to build DTO, explore data with their tools, etc. We could expose the former as Service or WebService as you proposed, and the catalog for the latest as Software Application. Here is the link to all the service a user can launch in its name on the platform: Service catalogs, GUI is here. Please note there are some private catalogs as well (the link above only show the public ones). However, once a user launch service instances for its usage, the API to get the info are authenticated. I am not sure there is a point to have JSON-LD in private endpoints anyway, right?

Processes

EDITO Processes API are not publicly available yet. There is catalog for all available processes and API for running processes. Like the services, we could implement SoftwareApplication for the catalog, but running processes are authenticated. However, processes will have two kind of API endpoints: one similar to services above, and one OGC API Processes. In any cases, I am sure we can implement something.

Tutorials

EDITO host a open and collaborative tutorial platform. Does it fit to the HowTo scheme?

Workflow

We aim at implement a CWL API liking metada and processes, but nothing to show yet.

Questions on our side: it is not clear to use how you want to distinguish the uses of Event and Actions. Giving your Actions example, it seems that it can contain our running processes. Related question, is there any point in referencing ephemeral info? (In our case, user service and process runs)

Looking forward to continue this work!

qgau avatar Feb 29 '24 14:02 qgau

Re: https://github.com/iodepo/odis-arch/issues/162#issuecomment-1971305512

Thanks @qgau

Questions on our side: it is not clear to use how you want to distinguish the uses of Event and Actions. Giving your Actions example, it seems that it can contain our running processes.

It is a bit of a(n intentionally) fuzzy distinction, but Events tend to be larger scale things (like concerts and festivals in the schema examples), while Actions are smaller-scale "things that were done".

In our world, an Event would be like a research expedition, or a typhoon. Corresponding Actions would be like the deployment of an Argo float, a sensor sensing something, or the building of sandbag walls. In the twin infrastructures, Events would be something like the spinning up or down of a twin, Actions more like triggering some analytical or modelling suite/module.

It's up to the application case to decide what the sensible limits for small vs large scale processes would be, I think, but trying to be as mesoscopic and human-level commonsensical is likely to yield the best results for the generic search and

Related question, is there any point in referencing ephemeral info? (In our case, user service and process runs)

I think that's for the TURTLE group to discuss. Right now, I think we first have to get the twins talking to each other about the big chunks of things they have (software, data, etc) and the major events and actions they (can) perform (i.e. their capabilities). Once we have a few twins talking about such things, we can get more fine-grained. As @rob-metalinkage notes here, that may be the stage to hand over to more specialist semantics and serialisations.

pbuttigieg avatar Feb 29 '24 15:02 pbuttigieg

Comments on https://github.com/iodepo/odis-arch/issues/162#issuecomment-1971305512

Metadata Links: Data catalog (STAC), GUI is here.

Fixing link for STAC catalogue: https://catalog.digitaltwinocean.edito.eu/

The catalog is variable-oriented. We use STAC collections for variables, and STAC items for any piece of homogeneous data over a given variable. STAC assets points to either actual data files or API endpoints to retrieve the data.

I see - so this would likely mean you'll make a lot of good use of the variableMeasured property in the Dataset type. Perhaps your system may generate Dataset JSON-LD on the fly based on what variables have been selected, in addition to any a priori dataset instances.

Currently, we use the Resto engine, but we can implement the https://github.com/radiantearth/stac-spec/issues/378 for example.

Seems like good options - the main objective is to make sure the JSON-LD/schema.org that comes out of such tools is correct and in good shape.

Services EDITO has two kind of services: the premanent service that run the platform (a catalog engine, a viewer, a tutorial platform, etc.) and services that the users can launch on their names to build DTO, explore data with their tools, etc. We could expose the former as Service or WebService as you proposed, and the catalog for the latest as Software Application.

The distinction between the Service and SoftwareApplication types is really about how these things are offered. Services are pretty generic, so I would imagine that you'd have a SofltwareApplication file for every component (permanent or user deployed) in the EDITO space. Some of these would be linked to Services (i.e. the software would be noted as an agent in the [potentialAction](https://schema.org/potentialAction) stanzas in a Service record) if the software can be triggered in a service.

Here is the link to all the service a user can launch in its name on the platform: Service catalogs, GUI is here. Please note there are some private catalogs as well (the link above only show the public ones). However, once a user launch service instances for its usage, the API to get the info are authenticated. I am not sure there is a point to have JSON-LD in private endpoints anyway, right?

It's up to you - some of the ODIS partners like to advertise that they have software or services that are available on request, after negotiation, or through payment / agreements. In the spirit of Open Science that the Commission keeps referencing, I would think that it would be wise to have metadata records on any publicly funded activity that doesn't have security or sensitive ethical concerns. This is the move to a "Transparent and Accessible Ocean" - especially for public oversight of publicly funded activities.

Processes EDITO Processes API are not publicly available yet. There is catalog for all available processes and API for running processes. Like the services, we could implement SoftwareApplication for the catalog, but running processes are authenticated.

A running process would be better modelled through an Action or Event type - those are processual entities that unfold through time. SoftwarerApplication records can be used to describe the software used during these processes (e.g. as agents of Actions)

However, processes will have two kind of API endpoints: one similar to services above, and one OGC API Processes. In any cases, I am sure we can implement something.

Yes, we'll just need to see examples and place them in the right slot.

Tutorials EDITO host a open and collaborative tutorial platform. Does it fit to the HowTo scheme?

Yes, one can use the HowTo types well there. If these are multimedia or document based tutorials, one can also use other types like DigitalDocument, VideoObject, etc

Workflow We aim at implement a CWL API liking metada and processes, but nothing to show yet.

I have the feeling that workflows will be captured en passant if there's good linking of Actions, agents, and Datasets or other things. We'll explore more with some examples later.

pbuttigieg avatar Feb 29 '24 15:02 pbuttigieg

Some outputs from today's meeting:

We discussed the types noted above and challenged them against implementations in EDITO and ILIAD, to figure out where their local components would fit into the generic framework that ODIS would pass on to other twins. The mind map below captures some of the flow:

image

Some specific points discussed:

  • Actions: these will be quite core to the orchestration inside a twinning environment. We noted that what the OGC API world is calling a "process" is actually a definition of a function, which we can model in schema.org as an Action with a PotentialActionStatus (which can also be a value of potentialActions a Service or other Things are capable of). https://docs.ogc.org/is/18-062r2/18-062r2.html#toc12
  • We noted some ambiguities with the notion of "dataset", including tension between human convenience bundling vs machine high-granularity discovery and interoperability.
    • There's some discussion here that is instructive, and treating datasets based on set theory conventions (any intentional grouping of data points) is generally a useful way to go. Datasets can be arbitrarily large or small, and all that's needed is an understanding of what grouped the data in a set.
    • The variableMeasured property can be used to gain higher resolution, especially if each variable has an address for that variable's data (e.g. a parquet column, a single vector dataset, a single datum).
    • For derivative datasets or metadata sets that describe part of a dataset, the isBasedOn property can be used to indicate that there's a derivation happening and something may not be complete, for example metadata about a cleaned/QCed dataset based on a raw dataset that has errors, omissions, etc
    • Further metadata-level subsetting of subject data can be performed by using subsets of variableMeasured values. or restricting temporalCoverage for subsetting values in, e.g., a data cube. These are quite advanced / finnicky issues, and probably better served by bespoke API calls or PIDs to subsetted data in the distribution property, to avoid occult metadata.
  • HowTo's seem to have quite a bit of promise. HowToItems such as tools (e.g. CPUs, servers, S3 buckets) is likely to be very useful and in need of some convention building from the team. List logic will be needed for step-wise representation of CWL flows, which can be linked to potential Actions and used to trigger them.

Next step: get some examples from EDITO (@qgau), ILIAD (@marcoamarooliveira e.g. here)

Move examples over to https://github.com/DITTO-OceanDecade/turtle once initial QC is done. Link to those from ODIS.

pbuttigieg avatar Feb 29 '24 17:02 pbuttigieg

nesting other descriptions with greater expressivity and domain-relevance within schema.org is easily done

@pbuttigieg can you provide an example on this? We would like to correctly nest the description of the inputs required and the provided outputs for the schema:SoftwareApplication.

marcoamarooliveira avatar May 16 '24 12:05 marcoamarooliveira

@marcoamarooliveira

Section 3.3.3 of this document describes the approach: https://zenodo.org/records/10219933

pbuttigieg avatar May 30 '24 12:05 pbuttigieg