Feature Request/Idea: RO-Crate support
As a followup to the discussions on Element with @pdurbin and @poikilotherm I would like to start a discussion here on the possible support of RO-Crate (https://www.researchobject.org/ro-crate/) in Dataverse.
I found these slides https://zenodo.org/record/4973678 and recording https://www.youtube.com/watch?v=LJq-mzT9v8o&t=1731s of the Dataverse Community Meeting from 2021 where Stian Soiland-Reyes discusses possibilities of RO-Crate export/import in Dataverse.
I was wondering if there is any followup to this presentation and whether there are official or community plans to support RO-Crate?
Let me give you a little bit of background how we imagine using a RO-Crate enabled Dataverse.
We are working on a new system built around Dataverse, where we would like to support RO-Crate as a dataset input format besides the usual DV way of uploading and metadata annotating of datasets.

Ideally our system would allow uploading and ingesting RO-Crate packages (eg. as .zip or BagIt) in Dataverse. For creating RO-Crates we plan to provide a RO-Crate Editor[1] but the RO-Crates can be assembled by users using any tool they see fit. The RO-Crates to be ingestible by Dataverse must be accompanied with metadata using schemas, which are understandable by Dataverse, therefore both the RO-Crate Editor and Dataverse must use the same schemas. As of now Dataverse provides out of the box 15 such schemas as "metadata blocks", so these schemas should be available to the RO-Crate Editor as well. In our system we would like to have an external system called "Schema registry" for storing these schemas and we imagine that these schemas would then be uploaded to and configured in both Dataverse and the RO-Crate Editor so that they are compatible when working with the metadata in the RO-Crates.
As we are building our system based on RO-Crate, we would be happy to work on or help in RO-Crate integration in Dataverse, but it would be good to know if there's something already implemented in this regard or if this idea is supported by IQSS or the DV community at all. Also pinging @qqmyers as suggested on Element to be interested in RO-Crate support as well.
[1] For the RO-Crate editor we are now investigating https://github.com/Arkisto-Platform/describo-online
Today at the HMC Conference 2022 I learned about https://github.com/kit-data-manager/ro-crate-java
Looks like this might be helpful to deal with RO-Crates programmatically.
Today I learned that .eln uses RO-Crate:
- #9363
Check out the RO-Crate file that is now downloadable from a .eln (zip) file:
- https://github.com/gdcc/dataverse-previewers/pull/21
Has anyone started working on the more general RO-Crate support for Dataverse already? Because it would be something we would like to work on but don't want to duplicate anyone's work.
The problem is, how we define "general RO-Crate support".
We currently have a solution, which works based on the Dataverse metadatablocks as schemas, but RO-Crate suggests the use of Schema.org, while allowing the use any other schema as well.
So, RO-Crate support in Dataverse can mean two things:
- Mapping current MDB values to some feasible Schema.org class/property
- Generating RO-Crate metadata using the MDBs as the schemas.
We have a solution for 2, where we use the required Schema.org Dataset for Root Data Entity and File/Dataset for Data Entities but use properties and classes of MDB-s for Contextual Entities
For a greater RO-Crate audience probably solution 1 would be welcome, but that would be a lossy conversion from MDB data to RO-Crate as not all MDB field/type might be mapped to a Schema.org value. Solution 2 provides import/export between Dataverse instances but may not be processable by other RO-Crate tools, which expect Schema.org based values.
Good to know. I had a discussion about RO Crate and repository support this Monday with Stian Soiland-Reyes and Marc Portier from the RO-Crate initiative about how repository support should/could look. But it would be something we could figure out in the context of this. E.g. uploading an RO Crate with the accompanying files and being able to extract the structure and metadata from it could be interesting, but we would have to look into how to export it afterwards without data loss from the extraction etc. In other words, there is some preparatory brain-storming necessary for the entire picture, but we would love to pick that up in part if no one else is working on that right now. Worst-case scenario; the conclusion is that it's not possible, but then at least we've given it a try.
RO-Crate suggests the use of Schema.org,
Could there or should there be any overlap with Croissant, which also builds on Schema.org?
Update: see also:
- https://github.com/mlcommons/croissant/issues/161
I'll repeat the post from the Google Group here as well, so everyone involved on the RO-Crate work up until now is also aware of this development:
At KU Leuven, we just received some great news from the FAIR-IMPACT project. We’ve been selected as one of 15 teams for the “Enabling FAIR Signposting and RO-Crate forcontent/metadata discovery and consumption” support action. After applying for it in June as suggested by Philipp Conzett (thanks Philipp :) ).
We’re hoping to do some work on improving and expanding the integration of RO-Crate with Dataverse. Our first job will be to figure out what is possible (cf. Issue #8688), but hopefully for this short project, we’ll be able to get started on a useful addition to the Dataverse project. We’ll keep you posted if anything is finished or when we want or might need some input on what the needs/wants are of the community.
If you have any input already, you can leave it here or add on to the Issue in Github as more input and ideas are always welcome.
Kind regards,
KU Leuven RDR Team (Kris, Eryk, Özgür and Dieuwertje)_
We are also selected in FAIR-IMPACT support action. :-)
As well as Describo Online I think you should look at Crate-O - which is being developed by my team at the University of Queensland. https://github.com/Language-Research-Technology/crate-o: this is similar to Describo Online (and the other multiple variants of Describo) in some ways but solves some issues that we had with that project. Happy to discuss with you why we chose to develop a new tool and how it might fit with Dataverse.
Up to date information about the Describo environment can be found @ https://describo.github.io/#/. Earlier implementations were proofs of concept that had many design issues and so are no longer supported.
@beepsoft and team have created an implementation of Describo RO Crate editing in Dataverse. A short intro is at https://describo.github.io/#/describo-users.
OK, good to hear that this work is already under way. Sounds like Crate-O is not needed here at the moment.
New PR by @beepsoft:
- #10086
Great stuff! ❤️ 🚀 🎉
I'm preparing a talk for the Open Repositories conference in June - is there an update on the progress on this feature or other RO-Crate support in Dataverse @beepsoft
I'm preparing a talk for the Open Repositories conference in June - is there an update on the progress on this feature or other RO-Crate support in Dataverse @beepsoft
It is still a pending PR, and there is no word yet on merging it or reworking it in other ways. However, it is a functional RO-Crate exporter implementation nonetheless.
I'm preparing a talk for the Open Repositories conference in June - is there an update on the progress on this feature or other RO-Crate support in Dataverse @beepsoft
As an FYI: there's another Dataverse RO-Crate exporter PR available (slightly different use case, but also developed in the FAIR-IMPACT support call): https://github.com/gdcc/dataverse-exporters/pull/15 . I'll be at the open repositories conference and do a talk on our work on this exporter, so maybe I'll see you there.
As an FYI: there's another Dataverse RO-Crate exporter PR available (slightly different use case, but also developed in the FAIR-IMPACT support call): https://github.com/gdcc/dataverse-exporters/pull/15 . I'll be at the open repositories conference and do a talk on our work on this exporter, so maybe I'll see you there.
@DieuwertjeBloemen do you have the slides from this talk published somewhere? Would be great to link to from RO-Crate website!
What do we need to do to get this merged?
@stain I'm not sure but I just offered to help @okaradeniz at https://github.com/gdcc/dataverse-exporters/pull/15#issuecomment-2154803999 . Usually I ping @cmbz @scolapasta to advise about priorities.
@beepsoft there's also your pull request at #10086 that isn't marked as closing this issue (#8688). Should it? And how do you feel about your pull request vs. the one by @okaradeniz?
Hi @stain the issue has already been prioritized. It's just waiting for the work currently in Sprint Ready to clear out so it can be added to the queue.
@stain The slides of the presentation are going to go on Zenodo as far as Open Repositories said. Once I see them appear there, I'll drop the doi here.
@beepsoft as I just mentioned at https://github.com/gdcc/dataverse-exporters/pull/15#issuecomment-2158481171 I just created a new dedicate repo for @okaradeniz at https://github.com/gdcc/exporter-ro-crate
Perhaps the two of you could collaborate on a single RO-Crate exporter?
Please let us know what you think! We can also talk it out on Zulip: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/RO-Crate/near/393962020
@pdurbin I'm not sure if merging the two is possible, as they have quite different set-ups and use cases. @beepsoft or @okaradeniz, correct me if I'm wrong and you do see this as possible. In my eyes, they're two different implementations of an RO-Crate exporter and can perhaps both be offered separately as external exporters so installations can choose based on what implementation makes most sense to them (what we might have to collaborate on is some explanation on the difference between the two so the choice is more transparent). Of course, that's up to @beepsoft to see if he has the time to set his work up like that as well.
@beepsoft as I just mentioned at gdcc/dataverse-exporters#15 (comment) I just created a new dedicate repo for @okaradeniz at https://github.com/gdcc/exporter-ro-crate
I cannot access this repo, I get "This repository is empty." error.
@beepsoft ah, sorry, yes https://github.com/gdcc/exporter-ro-crate is currently empty but the idea is that @okaradeniz will push these files to it: https://github.com/gdcc/dataverse-exporters/pull/15/files
@pdurbin I just pushed the initial commit, thanks again for the repository. The exporter needs some more work before publishing on Maven, which I'll start next week as soon as I finish some other work.
I also agree @DieuwertjeBloemen that the exporters seem to differ in many aspects, at least in their current states, but we can work with @beepsoft on clarifying what they offer differently.
Thanks @okaradeniz!
The two main differences between your implementation and ours as I see:
-
Yours uses the Exporter plugin approach and the Dataverse JSON representation as the base to convert it to RO-Crate. This allows adding the exporter to any Dataverse installation as required. Our implementation works with Dataverse's internal objects and cannot be externalized; i.e., it must be compiled with Dataverse.
-
Your implementation uses a
dataverse2ro-crate.csvto define mapping to RO-Crate. This allows shaping how the exportedro-crate-metadata.jsonshould look, specifying what field types (URIs) Dataverse properties should be mapped to. With this approach, only those fields that have a defined mapping indataverse2ro-crate.csvwill appear in thero-crate-metadata.json. Our implementation generates RO-Crate based on exactly what is present in the Metadatablocks; i.e., every field will be present in the ro-crate-metadata.json automatically, and we use the URIs assigned (explicitly or implicitly) to dataset field types - no further configuration or mapping is possible.
I think your implementation is more flexible both in terms of being implemented as an exporter plugin and also with the dataverse2ro-crate.csv mapping approach.
I think a default behaviour of your implementation could be to work without dataverse2ro-crate.csv and use the MDB and dataset field names and URI-s as is in the RO-Crate. This would result in something similar to what we have in our implementation. And if someone needs customization, they could add a proper dataverse2ro-crate.csv.
My only concern with dataverse2ro-crate.csv is whether it is flexible enough for all mapping use cases. I haven't thought it through yet, but you have probably put more thought into what it is or isn't capable of.
An additional major difference in my mind is how the two exporters approach the problem of mapping Dataverse metadata blocks to the properties in the ro-crate-metadata.json.
The one offered in @beepsoft's PR follows this part of the ro-crate specification:
However, as RO-Crate uses the Linked Data principles, adopters of RO-Crate are free to supplement RO-Crate using Schema.org metadata and/or assertions using other Linked Data vocabularies.
So it includes the dataverse installation as a vocabulary in order to be able to use the metadata blocks in the resulting json:
A disadvantage resulting from this is that fields that are perfectly mappable to schema.org based ro-crate properties are taken directly as they appear in the dataverse metadatablocks (such as title instead of name) by adding dataverse.org or the dataverse installation to the @ context.
We took a different approach, by letting installations choose how they want the metadata to map to the properties in the export. The advantage is that installations have both flexibility and compatibility with the ro-crate specifications, and compliance with Schema.org. The disadvantage is that it needs more work from the installation if they want to customize it.
I completely agree with you that mapping to Schema.org vocabulary is really useful. I was just wondering whether it would be possible to have a default, which falls back to using metadatablock definitions (names, URI-s) if no dataverse2ro-crate.csv mapping is provided? This way, you could both customize your RO-Crate the way you want but don't need to bother with other metadatablock that are, say, local to you installation or already have URI-s from well known vocabularies (eg. Dublin Core). All this could be somehow configurable via dataverse2ro-crate.csv as well.
We will have a default csv file that comes with the exporter, with mappings from the default dataverse metadata blocks to the schema.org based properties in the ro-crate specification. That way, the exporter can support the ro-crate specs and comply with schema.org as much as possible out-of-the-box. Your suggestion would definitely help the exporter to cover the default and custom metadata, but it also would hardcode a default behavior that would (in many cases I think) result in exports that aren’t compliant with the ro-crate specs.
What RO-Crate compliance problems do you see here?
As I understand, whatever can be mapped to Schema.org will be mapped, and the rest could use @context URI-s from other vocabularies, which is allowed and supported by (JSON-LD and therefore) RO-Crate. That's a different issue that these custom or not well known properties may not be automatically interpreted/imported by other RO-Crate systems, but we could still make all Dataverse data available in the RO-Crate JSON for those that can or willing to handle them.