Convene a working group on implementation neutral Profiles, schemas, RO-Crate editor and documentation tools and validation
As discussed at the Steering Committee meeting.
This is my contribution - which links to the other current work of which I am aware in this space:
https://github.com/Language-Research-Technology/ro-crate-schema-tools/blob/sossplus/profiles/sossplus/sossplus-profile.md
Dear @ptsefton,
Although this work has a different scope, it presents a similar solution to our 0.2 proposal for expressing schemas in the RO-Crate JSON manifest. In fact, with a few adjustments, we could adapt our tooling to support it.
Therefore, I would like to join the working group alongside @AndreasMeier12.
Additionally, while the current specification allows the inclusion of everything as JSON-LD in the manifest, I believe it should be extended to incorporate enhancements introduced in our 0.3 proposal:
- Supporting the expression of schemas and metadata outside the manifest, as this information can become quite large.
- Allowing the use of formats other than JSON-LD for expressing schemas and metadata.
Even if tooling can initially only supports JSON-LD, offering a clear interface would allow parts of the community to extend support to formats like CSVW, LinkML, etc.
I would appreciate it if you could share your thoughts on these proposed extensions, @ptsefton, given your role in authoring the spec.
Sadly the European drop-in session was moved to the 12 of June, both @AndreasMeier12 and me will be making a RO-Crate workshop in Bern.
Can we organize an open Zoom call to the rest of the community at a different date?
@juan-fuentes-sis re your comments above about allowing other formats: RO-Crate has always allowed for any kind of payload file to be included in a crate - so if you want to use an RO-Crate to package LinkML or Owl in RDF syntax you can do that, and you are also free to describe the semantics of this in a profile. This does not need changes to the spec it just means that you can document this for you community, build tools etc.
As one of the authors, I am advocating that we make RO-Crate 2 simpler, not more complex and move a lot of the current specification into a series of profiles that help with "how do I use RO-Crate for X". My advice to you is still to work in an RO-Crate compatible way, build the tools you need for your use case, test them in use and refine, document in a profile and then the broader RO-Crate community will be able to adopt your approaches.
NOTE that you CAN include CSVW in RO-Crate it can be flattened easily - we are doing so in our project -- there's even an example in this issue.
NOTE also that it is very difficult to keep track of your pofile when you keep changing the issue with updates - it makes the discussion very hard to read. I recommend (again) that you put this profile in a public repository somewhere else where people can comment and raise issues. This repo is not the place of individual profile development.
This is an attempt to write out my thoughts on the matters discussed here and in #399 (and other places), to clarify them for myself and possibly help clarify them for others (or eventually receive corrective responses if I have misunderstood things). Since it is not clear where to continue the discussion from the meeting organised in #399, I decided to add some comments here, as the other issue is due to be closed.
As one of the authors, I am advocating that we make RO-Crate 2 simpler, not more complex and move a lot of the current specification into a series of profiles that help with "how do I use RO-Crate for X".
First, I just wanted to say, as a new adopter, that I highly support this idea! Simplifying the main specs will make RO-crate significantly easier to adopt, and moving subparts into smaller profiles would help focus relevant discussions in e.g. profile repos. Adopters could then more easily "pick and choose" profiles of relevance to them.
Now, to the discussion: the way I understand the current proposals, is that they try to tackle two related, but also inherently different use cases:
A. Make profiles more machine-operable.
In essence: extending the RO-Crate spec to include descriptors of entities and their relationship as defined in a profile, in a straightforward and human-readable way. This would allow validation of RO-Crates agains specific profiles, and also support more harmonised documentation.
- Since machine-operability of RO-Crate profiles is inherently RO-crate-centric, then describing a profile using a custom-made extension of RO-Crate makes sense (given that nothing that already exists that perfectly match the needs in terms of implementation-neutrality, readability, expressivity, etc). The domain of targeted validators would be ones that explicitly support RO-crates, and thus limited. Tool developers can then transform such a "profile schema" to whatever is needed by their underlying tooling. As they are adopters of RO-Crate, they would have an incentive to implement such "schema transformation" mechanisms (or operate directly with the "profile schema").
B. Describe and persist schemas for use in data interoperability "out there".
In essence: simplify integration of data contained within RO-crates with different services and tools out in the open, possibly within a particular domain etc. Two variants have been mentioned:
-
The data is contained as a separate file in an RO-Crate. A simple example is a schema for data validation describing e.g. a CSV file. Such a "data validation schema" could either be included as another file in the crate or inlined within the metadata. Since the types of data to be described really can be anything, any type of schema serialisation should IMO be supported. Here, we should not re-invent the wheel. The main point would be to improve and standardise the descriptions of the schemas (i.e. meta-descriptions pertaining to the schemas as such), whether these are included within the RO-crate as metadata or as separate files. This would target any type of data validators used in some relevant domain. Due to the broad domain, there would be few incentives for any particular validator maintainer to implement support for a custom schema serialisation scheme defined by the RO-Crate community. Hence, one should IMO make use of existing standards as much as possible, allowing variety, but possibly recommend some of them (e.g. CSVW which supports inclusion in the RO-crate metadata). The aim would be to increase the chance for an RO-crate to contain a generally well-supported schema serialisation format for the user of the crate to understand and validate the data.
-
The data is part of the RO-Crate metadata graph. A more complex example is a schema describing entities that are available within the RO-crate graph. Since any such data by necessity must conform to the RO-crate specification, technical solutions might overlap with use case A. However, even though validation towards such a "graph schema" might include the use case of validating against an RO-crate profile, it can in principle (if I understand this correctly) be used to validate anything included in an RO-crate graph somewhere, independently of the existence of a relevant profile. The way I see this, such a sub-graph could easily be broken off and provided in a separate file. Thus, B2 is really only a specific subset of B1, where the data of concern is a part of an RO-crate graph. Hence, the same arguments still apply (follow standard schema serialisations, etc.).
To conclude, the way I see it:
-
Use case A may warrant custom extensions of RO-crate for describing restrictions and rules of an RO-Crate profile. Solving this seems very useful.
-
Use case B only warrants RO-crate extensions that relate to the meta-description of schema serialisations for validation of data within an RO-crate (bearing in mind that one man's data could be another man's metadata). The schemas themselves should follow external standards. Such "meta-description" extensions could also be used to describe "profile schemas" (use case A). Solving this also seems very useful.
@sveinugu, your summary matches roughly what I was thinking of writing up as well. The two additions I'd make are:
- This says for me that we leave out the CSV and json-schema use cases initially this discussion. (I have use cases for both and am interested in continuing the discussions, but I think for now that must be outwith the Profiles discussion.)
- I think a corollary of 2B. is to say within the Profiles discussion we should only consider things that are JSON-LD. And so perhaps the primary question for @ptsefton, @stain and co. is: when should a subgraph that can be represented as a Profile not be represented as one? Or are we generally onboard with embedding JSON-LD's as "RO-Crates all the way down"?
@sveinugu thanks for the summary.
As you already make the link from B2 to A I would suggest the following cascade of schema implementations:
- I1: Schema for data within the RO-Create graph, enabling both A and B2
- I2: Schema for data referenced but outside the RO-CRATE graph while the schema can be written as part of the graph (B1)
- I3: Schema for data referenced but outside the RO-CRATE graph while the schema cannot be written as part of the graph (B1)
I1: Schema for data within the RO-Create graph, enabling both A and B2
I share the view of @sveinugu that modelling the profiles with a well established language, avoiding niche and especially new custom languages will enable a broader acceptance.
As already discussed we see JSON-SCHEMA here as perferable because of the wide tooling support and conceptual proximity to object oriented programming. To overcome the limitation of JSON-SCHEMA for modelling and validating linked data we defined a extension (OO-LD), which specifies an embedded JSON-LD context and some additional annotations, especially x-oold-range to define the range of properties. There's also a python package (oold-python) that implements the equivalent extension to pydantic transforming string-properties to class-typed ones with graph binding at run time.
The RO-Crate core profile would consist of a basic RoCrateThing.schema.json
{
"$id": "RoCrateThing.schema.json",
"@context": [
"https://w3id.org/ro/crate/1.1/context"
],
"type": "object",
"required": [
"type"
],
"properties": {
"@type": {
"type": "string",
"default": "Thing"
},
"@id": {
"type": "string"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
RoCrateThing.schema.json
RoCrateThing.schema.json
subclassed by RoCrateDataset.schema.json, RoCrateOrganization.schema.json and so on.
{
"$id": "RoCrateDataset.schema.json",
"@context": "RoCrateThing.schema.json",
"allOf": [{"$ref": "RoCrateThing.schema.json"}],
"type": "object",
"required": [
"..."
],
"properties": {
"...": "..."
}
}
RoCrateDataset.schema.json
RoCrateDataset.schema.json
Which are all imported into the actual profile
{
"$id": "RoCrate.schema.json",
"@context": "RoCrateThing.schema.json",
"allOf": [{"$ref": "RoCrateThing.schema.json"}],
"type": "object",
"required": [
"conformsTo",
"sdPublisher",
"about",
],
"properties": {
"@id": {
"type": "string",
"const": "ro-crate-metadata.json"
},
"@type": {
"type": "string",
"default": "CreativeWork"
},
"conformsTo": {
"type": "string",
"default": "RoCreate.schema.json",
},
"version": {
"type": "string"
},
"sdPublisher": {
"type": "string",
"x-oold-range": {
"allOf": [{"$ref": "RoCreateOrganization.schema.json"}]
}
},
"about": {
"type": "string",
"x-oold-range": {
"allOf": [{"$ref": "RoCrateDataset.schema.json"}],
"properties": {
"id": {
"const": "./"
}
}
}
}
}
}
RoCrate.schema.json
RoCrate.schema.json
A custom profile would have to create, e.g. an extension of RoCrateDataset.schema.json, e.g.
{
"$id": "RoGeolocatedDataset.schema.json",
"@context": "RoCrateDataset.schema.json",
"allOf": [{"$ref": "RoCrateDataset.schema.json"}],
"type": "object",
"required": [
"location"
],
"properties": {
"localtion": "..."
}
}
RoCrateGeolocatedDataset.schema.json
RoCrateGeolocatedDataset.schema.json
and a corresponding profile, e.g.
{
"$id": "RoCrateGeolocatedProfile.schema.json",
"@context": "RoCrate.schema.json",
"allOf": [{"$ref": "RoCrate.schema.json"}],
"type": "object",
"properties": {
"conformsTo": {
"type": "string",
"default": "RoCrateGeolocatedProfile.schema.json",
},
"about": {
"type": "string",
"x-oold-range": {
"allOf": [{"$ref": "RoCrateGeolocatedDataset.schema.json"}]
}
}
}
}
RoCrateGeolocatedProfile.schema.json
RoCrateGeolocatedProfile.schema.json
This would also 1:1 translate to Object Oriented Programming, e.g. in Python/Pydatic via oold-python:
from oold import LinkedBaseModel
class Thing(LinkedBaseModel):
type: str
id: Optional[str] = None
name: Optional[str] = None
description: Optional[str] = None
class Organization(Thing):
pass
class Dataset(Thing):
pass
class RoCrate(Thing):
sdPublisher: Organization
about: Dataset
class GeolocatedDataset(Dataset):
location
class RoCrateGeolocated(RoCrate):
about: GeolocatedDataset
Python Code generated from RoCrateGeolocatedProfile.schema.json (simplyfied)
Finally a RO-CRATE instance needs to refer to the profile (e.g. RoCrateGeolocatedProfile.schema.json, per conformsTo (other options: @type and $schema)
{
"@context": "https://w3id.org/ro/crate/1.2/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "RoCrateGeolocatedProfile.schema.json"},
"about": {"@id": "./"}
},
{
"@id": "./",
"@type": "Dataset",
"location": "..."
}
]
}
This allows any JSON-SCHEMA validator to validate a RO-Create profile by implementing the following additional steps: 0. Flatten and compacting the document
- pull the node with
"@id" = "./"from the@graph - fetch the schema(s) = profil(s) given per
conformsTo - validate the node's JSON with the given (OO-LD) JSON-SCHEMA using a standard lib
- if a property is annotated with
x-oold-rangepull the node(s) which match the@ids given in the property value from the@graph. Go to 3. using the value ofx-oold-rangeas schema - end
@ptsefton, @sveinugu Let me know what you think
I2: Schema for data referenced but outside the RO-CRATE graph while the schema can be written as part of the graph (B1)
This is given if the schema is expressed in RDF. CSVW is a perfect example here, since its RDF and already a W3C Recommendation (see also https://w3c.github.io/csvw/syntax/#recognizing-tabular-data-formats) @ptsefton already provided a example for inlining CSVW in a RO-CRATE, which is exactly how I would do it: https://github.com/ResearchObject/ro-crate/issues/27#issuecomment-1131227990
{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
{"csvw": "http://www.w3.org/ns/csvw#"}
],
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"hasPart": [{"@id": "example.csv"}]
},
{
"@id": "./example.csv",
"@type": "File",
"csvw:url": "./example.csv",
"csvw:tableSchema": {"@id": "#dialog_schema"}
},
{
"@id": "#dialog_schema",
"@type": "csvw:Schema",
"columns": [...]
}
]
}
I3: Schema for data referenced but outside the RO-CRATE graph while the schema cannot be written as part of the graph (B1)
In this case we could use somethin like dct:conformsTo to point, e.g. from an XML document to an XML-Schema document.
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "./",
"@type": "Dataset",
"hasPart": [{"@id": "example.xml"}]
},
{
"@id": "./example.xml",
"@type": "File",
"conformsTo": "./example.xsd"
},
{
"@id": "./example.xsd",
"@type": "File"
}
]
}
@simontaurus This kind of approach has been suggested before, to use JSON schema you need to turn the graph into a set of small tree-shaped objects right? - I'd like to see a working implementation that deals with real-world use cases and profiles. How does it deal with singleton and array values which may be a mixture of references to other entities in the graph or on the web and scalar values like strings or integers? Do you have a prototype you could show with implementations of existing profiles?
Even a small example would be helpful - what would the schema/profile look like for saying "A creative work must have an author property, which has one or more values, each of which may be a string, a Person or an Organization"?
My current thinking I think that this approach might be useful as an implementation choice, as an alternative to SHACL but that we can probably express profile schemas using pretty much the way Schema.org does it according to the existing spec with additional props for cardinality, as per the experimental work I linked above -- this makes for much cleaner, simpler looking schemas. Introducing JSON-Schema primitive types might be a good idea though eg to use pattern for regex validation.
@ptsefton Yes, in general JSON-SCHEMA as it is can only validate tree-shaped JSON documents. But as mentioned OO-LD defines the necessary extensions to describe and validate graph-shaped JSON(-LD) documents.
It also perfectly possible to define mixed type array, e.g. for your example:
{
"$id": "CreativeWork.schema.json",
"@context": "Thing.schema.json",
"allOf": [{"$ref": "Thing.schema.json"}],
"type": "object",
"required": [
"author"
],
"$defs": {
"Author": {
"title": "Author",
"anyOf": [
{
"title": "Simple string",
"type": "string"
},
{
"title": "Person (IRI)",
"type": "url",
"x-oold-range": {
"allOf": [{"$ref": "Person.schema.json"}]
}
},
{
"title": "Organization (IRI)",
"type": "url",
"x-oold-range": {
"allOf": [{"$ref": "Organization.schema.json"}]
}
}
]
}
},
"properties": {
"author": {
"oneOf": [
{
"title": "Single value",
"$ref": "#/$defs/Author"
},
{
"title": "Multiple values",
"type": "array",
"minItems": 1,
"items": {
"$ref": "#/$defs/Author"
}
}
]
}
}
}
However, I think we should avoid to be so lax on typing since it puts the burden on the data consumer to handle them all with a lots of if statements. The equivalent notation in python would be
Author = Union[str, Person, Organization]
class CreativeWork(Thing):
author: Union[Author, List[Author]]
and any depending code would have to deal with all six cases.
In this particular szenario this also comes from the missing superclass LegalPerson (see https://github.com/schemaorg/schemaorg/issues/700#issuecomment-2423933925)
To retrofit existing data this might be needed, but we probably should use new profiles to be more strict, e.g. asuming a list of Persons or Organizations. Case 'single element' would be covered by array length 1, case 'simple string' would be covered by creating a Person or Organization object while using the simple string as name.
{
"$id": "StrictCreativeWork.schema.json",
"@context": "CreativeWork.schema.json",
"allOf": [{"$ref": "CreativeWork.schema.json"}],
"type": "object",
"required": [
"author"
],
"properties": {
"author": {
{
"title": "Multiple values",
"type": "array",
"minItems": 1,
"items": {
"anyOf": [
{
"title": "Person (IRI)",
"type": "url",
"x-oold-range": {
"allOf": [{"$ref": "Person.schema.json"}]
}
},
{
"title": "Organization (IRI)",
"type": "url",
"x-oold-range": {
"allOf": [{"$ref": "Organization.schema.json"}]
}
}
]
}
}
}
}
}
In python
class StrictCreativeWork(CreativeWork):
author: List[Union[Person, Organization]]
or SHACL
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix schema: <http://schema.org/>.
@prefix : <http://example.org/> .
:StrictCreativeWorkShape
a sh:NodeShape ;
sh:targetClass schema:CreativeWork ;
sh:property [
sh:path schema:author ;
sh:minCount 1 ;
sh:nodeKind sh:IRI ;
sh:or (
[ sh:class schema:Person ]
[ sh:class schema:Organization ]
) ;
] .
IMHO this is also the pro and con of the schema.org data model: It cannot be very strict to cover millions of individual websites which try to improve their search ranking while a data comsumer like Google can also deal with large variance in the data. So its more close to an ontology or weak standard / recommendation than an actual data schema.
But for us in the science domain its much more about to ensuring data quality and FAIRness levels by means of data schemas while data consumer are more less the same single individuals that create the data, consequently dealing with variances on both sides.
In addition schema.org is a centralistic approach while RO-Crate Profiles should be federated, not relying on a central autority that defines them. The centralistic aspect is also the reason why their open-world model can work as a schema since no third-party extending definition of the terms (e.g. schema:author schema:rangeIncludes schema:Place) is expected.
So if we model the same issue in SOSS+
{
"@context": [
"https://w3id.org/ro/crate/1.1/context",
{
"@vocab": "http://schema.org/"
}
],
"@graph": [
{
"@id": "#class_CreativeWork",
"@type": "rdfs:Class",
"rdfs:label": "Dataset",
"name": "Dataset",
"prov:specializationOf": {
"@id": "https://schema.org/CreativeWork"
}
},
{
"@id": "#prop_author_CreativeWork",
"@type": "rdf:Property",
"rdfs:label": "author",
"name": "author",
"prov:specializationOf": {
"@id": "http://schema.org/author"
},
"schema:domainIncludes": {
"@id": "#class_CreativeWork"
},
"schema:rangeIncludes": [
{
"@id": "#class_Person"
},
{
"@id": "#class_Organization"
}
],
"sh:minCount": 1
}
]
}
we face multiple challenges if we want to intepret this as a data schema:
- currently no support for inheritance (already described in https://github.com/Language-Research-Technology/ro-crate-schema-tools/blob/sossplus/profiles/sossplus/sossplus-profile.md) (vs. JSON-SCHEMA allOf and Object Oriented programming)
- currently no globally reference-able sub components (e.g. to reuse a property) (vs. JSON-SCHEMA $ref JSON-Pointer)
- enforcement of the presence of
authoronly indirectly throughschema:domainIncludes(what if some adds additional domains and ranges?) (vs. closed form JSON-SCHEMA => only further restrictions allowed) - introducing a new property
#prop_author_CreativeWorkto restrict domain and range without changing the actual semantic meaning in reference toschema:authorwhich makes processing and querying harder (vs. purely syntactical restrictions in JSON-SCHEMA or different keywords that map to the same ontology term)
I share the view of @sveinugu that modelling the profiles with a well established language, avoiding niche and especially new custom languages will enable a broader acceptance.
Well, my argument was really that modelling the profiles using a custom extension of RO-crate (such as SOSS+) could make sense for use case A (in isolation) as the intended audience is really people that are already invested in RO-Crate (or plan to be). But adopting a solution that fits all use cases and in addition being already a standard would probably be the better choice.
Of the three choices on the table, only SHACL is in itself an established external standard, however as far as I understand, it does not easily translate into e.g. pydantic or other traditionally tree-based validation tools. It is also a threshold for profile creators and users to grasp and learn.
The other two choices on the table right now are extensions of established representations, Schema.org and JSON Schema, respectively. I am personally in favour of a JSON-Schema-like approach, given that it meets the needs. My intuition tells me, however, that it would be strange if something like this hasn't been tried before. Have you carried out a thorough landscape review before implementing OO-LD?
SHACL has already been adopted and tested by my group and it also powers almost the entire crs4/rocrate-validator.
Schema.org uses RDF Schema, which as noted is not a validation framework, although it forms the basis for both OWL and SHACL which are.
I question why Pydantic-style integrations are needed since linked data is an entirely different world.
I question why Pydantic-style integrations are needed since linked data is an entirely different world.
Which is the main problem of RO-Crate, JSON-LD and linked data in general, and IMHO exactly why pydantic-style integrations are highly needed!
Edit: there is a reason why e.g. SOAP is now in the dustbins of computer history.
I question why Pydantic-style integrations are needed since linked data is an entirely different world.
I would argue that this is only technological true, not conceptually.
Most people only want to define single-hop restrictions like "A Dataset MUST have at least one author which MUST be a valid Person or Organization" to ensure data quality during its creation or consumption, which is perfectly expressable in a object oriented framework like pydantic. Multi-hop restrictions that require SHACL like "A TensileTestDatasetOnAgedCopperSamples MUST be about a MeasurementProcess where a Sample participates that consists of Copper and was priviously taking part in an AgingProcess" occur much less frequent and more on the data analytics side where people usually have different skill sets.
Following @sveinugu argumentation that the semantic web failed so far due to its lack of integration in main stream software development we (and others) work on a carefully engineered bridge between both worlds:
(see https://github.com/OO-LD/schema?tab=readme-ov-file#related-work)
There are many benefits on this approach because all of a sudden the stuff developers love (validators, Pydantic, FastAPI/OpenAPI, LLM toolcalling agents / MCP, etc.) still works despite producing / consuming linked data in the background (see https://github.com/OO-LD/schema?tab=readme-ov-file#overview)
The main issue of this connection is the linked data doctrine on equality of a memory address and a web address that cannot be represented on conventional object oriented programming but with a thin object-graph-binding on top, as demonstrated with https://github.com/OO-LD/oold-python
Have you carried out a thorough landscape review before implementing OO-LD?
@sveinugu: Yes, see https://github.com/OO-LD/schema?tab=readme-ov-file#schema There are some custom RDF-based languages (Semantic Aspect Meta Model, Upper) that are transpiled to more general schemas (JSON-SCHEMA, GraphQL, ...). LinkML is more close, but still a custom language where the schemas itself is not build as distributed linked data but can be transpiled to JSON-SCHEMA and JSON-LD which makes translation to OO-LD easy (see https://github.com/OO-LD/schema?tab=readme-ov-file#linkml). REST-API-LD is also JSON-SCHEMA based but lacks of direct complience with JSON-LD
Sorry for coming in late to this discussion. Am finally reading up on the original contribution, and skimming part of the discussion.
... For now, I would mainly want to get the scope and intent of this right,
None of this has the intention to doing away with the flexibility we already have concerning the basic ro-profile usage, right? -- i.e. ro-crate is only suggesting such profile should be a contextual entity, that is prof:Profile. Those in turn can list many hasResource parts that all might play different roles in different formats (and thus technology implementations)
From that angle:
- should that prof:Profile intermediate level not be already considered 'implementation neutral'? i.e. allowing many different things?
- should any additional content of these profiles (i.e. their resources) not be externalised into different files -- even if those would be specific or "tuned to be ro-crate-fitting" -- so linked from, but not part of the ro-crate-metadata.json
To allay your concerns, this work is not about NOT allowing any resources that might be useful in a profile.
should that prof:Profile intermediate level not be already considered 'implementation neutral'? i.e. allowing many different things?
I guess in the sense of allowing different things it is implementation neutral in that you can write a SHACL-based validator that validates profiles against SHACL Schemas a given language and I can write one based on Schema.org Style schema definitions - but these use two different schema lanaguages. It's the standardization of which schema (if any) language we chose to bless as "RO-Crate Schema Lanague" and how practical it is to implenent that in different environments.
should any additional content of these profiles (i.e. their resources) not be externalised into different files -- even if those would be specific or "tuned to be ro-crate-fitting" -- so linked from, but not part of the ro-crate-metadata.json
That's an open question at this stage, but I like the idea of a schema language that can be written and manipulated with RO-Crate tools (libraries, HTML generators, editors) that is self contained.
thx for clarifying, concluding from my side I would just be careful with the level of "blessing"
- think I get that urge to have a ro-craty "schema" format
- still I think it should not be assumed, but rather declared like any other
- and would prefer (at least allow) for schemata statements to live in their own files
Regarding examples of RO Crates with real data.
We have this set of 10 working examples of RO-Crates generated from Galaxy. They contain the workflow, provenance, data (input, intermediate, output) and parameters. Can be inspected and executed to test reproducibility. You can view and download the RO-Crates from Zenodo: They are also uploaded on WorkflowHub (Example) which allows execution on the STFC materials galaxy instance . NOTE: requires registering for an account and logging in before clicking on the run in galaxy button.