Module-5-Open-Research-Software-and-Open-Source Repo metadata

The issue has been raised by @danielskatz on Twitter https://twitter.com/danielskatz/status/1036992161508667392 about the need to 'declare the metadata for the repository'.

I will review our current coverage of this issue and look how to proceed.

I will document the issue in full below.

Sep 05 '18 10:09 mrchristian

The current position for recording metadata of the repository has been for a 'lite' approach. This is mainly informed by trying to keep the amount of ground covered in the instructions to a minimum.

Here is what is currently described for recording metadata:

https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md#getting-a-doi-

To summarise the process:

Zenodo captures author names from the GitHub repository
Admins of the Zenodo can edit metadata
Zenodo generates version numbers
Zenodo assigns DOIs
Zenodo has a variety of metadata fields that can be filled in

Sep 05 '18 10:09 mrchristian

Brilliant, thanks @mrchristian. Is there a way we can use the communities function of Zenodo to make things a little easier here? I'm not sure exactly what sort of things this allows just yet https://zenodo.org/communities/open-science-mooc/?page=1&size=20

Sep 05 '18 10:09 Protohedgehog

Communities - i think it just functions as a collection of some sorts. I'll work back through Zonodo's metadata editing and generation process, I have a bunch of repos on Zenodo I can try this out on. Then I'll have a think about how to best wade through the swamp :-)

Sep 05 '18 12:09 mrchristian

Looking at Zenodo's metadata representation of a deposit it would seem to use that as the editing site of the metadata and then put a file back into the GitHub repository as some point if needed as whatever format is preferred, BibLatex, etc.

You can also see the fields listed here http://developers.zenodo.org/#depositions

The owner of a Zenodo deposit can edit the metadata via the web interface, not sure if there is group access.

The reason I suggest using Zenodo as the key location for maintaining metadata is that Zenodo will do the job of distributing the metadata.

As an idea for later it would be nice to get use the Zenodo API to write the metadata back to your repo in whatever flavor of markup preferred.

I'll give it a spin on a dummy repo

Sep 05 '18 19:09 mrchristian

OK, awesome, thanks @mrchristian! Will be interesting to see how this can ultimately feed back in either to how we index the MOOC content, or as part of the learning content.

Sep 06 '18 02:09 Protohedgehog

Back on the case now, will get this sorted this week. First is to consult @Zenodo support and get a usable representation of their metadata schema, then consult the #softwarecitation community about the dilemma of which route to take: Zenodo output, CFF, CodeMeta, BibTex.

Its so annoying that these things are not clear and worked out already. If only all that money being wasted on research service companies profits was actually used to fix basic plumbing problems in academia, Jees :-) The prisoner emerges from the cave.

Sep 16 '18 12:09 mrchristian

@mrchristian not sure if this helps, but Chris Gorgolewski has written a neat run-down on how this might work automatically:

http://blog.chrisgorgolewski.org/2017/11/sharing-academic-credit-in-open-source.html
- with file example here: https://github.com/nipy/nipype/blob/master/.zenodo.json

and I guess sticking to a minimal content scheme for author names of

{
      "name": "Rabbit, Roger",
      "orcid": "0000-0002-468-1234"
}

would easily suffice, don't you think?

(I think we've had the same issue over at https://github.com/Open-Scholarship-Strategy/site/issues/30 hence I'm just copying it here 😉 - sadly, my personal skills at proper metadata coding are rather limited, it was rather a copy&paste try 'n' error thing :) )

Sep 16 '18 12:09 tosteiner

Hey, thank you, brilliant. Do you think this approach enables the contributor information to get incorporated into the Zenodo and DataCite records for the repository?

That's one of the goals I'm trying to achieve as thats the information others are harvesting.

Thanks again :-)

Sep 16 '18 12:09 mrchristian

As far as I understood it, it adds the possibility to push author info to the Zenodo repo, so yes, it's incorporated with Zenodo... and DataCite then picks that up and uses it for its own purposes :)

Sep 16 '18 12:09 tosteiner

AOK, the super.

Zenodo outputs the 'deposit' metadata in a variety of formats so others can use it.

I can see on the example repo they have extensive metadata, I'll try out the process on a test repo, or on Zenodo's sandbox and see if the creator names get picked up into the system.

https://zenodo.org/record/581704/export/dcite4

Sep 16 '18 17:09 mrchristian

Hi,

Glacially slow reply, must be on some low frequency packet radio system.

But I'm finally back on it and I've got it cracked. Well at least whats going on. More to do to really sort out the full situation, a bit out of my scope, but at least I can now recommend a better solution than we started with.

So, whats the 'craic' as they say.

Zenodo picks up a file called .zenodo.json to read metadata. Of course no one makes this clear, instead its hidden in tab, deep in the Zenodo repository area.

JSON Export Zenodo automatically extracts metadata about your repository from GitHub APIs. For example, the authors are determined from the repository's contributor statistics. The automatic extraction is solely a best guess. Add a .zenodo.json file the root of your repository to explicit define the metadata. The format of file is the same as for our REST API (use e.g. below JSON to get started).

The results of doing this is what @tosteiner pointed me too, thank you. But I then needed to understand whats going on.

I did a test in Zenodo's sandbox site.

https://sandbox.zenodo.org/record/246036

from repo

https://github.com/hybrid-publishing-group/book-coding/tree/master

You can actually write lots of the metadata here, see example, but not things like any UIDs.

https://github.com/hybrid-publishing-group/book-coding/blob/master/.zenodo.json

This is more like what we would need, just names, although even in this case there can be 'contributors' and 'creators', also with types, 'editor', 'researcher'. etc.

Soooooo.... In a nutshell my recommendation is as follows.

A key objective is to get rich person metadata into the DOI information ecology and in the repository.

So using the .zenodo.json file is a vast improvement over the GitHub user name.

I need to refine the process, workflow and give exact instructions, with an example, and find out from Zenodo and their API documentation and support the extent of what person fields can be added. http://developers.zenodo.org/#metadata-formats

Consult with Zenodo support, software citation community. As I have heard that CodeMeta files can also be read, maybe others can too, like BibTeX?

My aim would be a write up for tomorrow, then consult and then wrap it up. I'll also write a blog post on this as it needs more profile as currently I couldnt find any documentation on the process.

Cheers

Simon

Oct 04 '18 20:10 mrchristian

Adding support for codemeta is on the Zenodo roadmap and should make this much easier.

Oct 04 '18 20:10 mfenner

I don't know if the CodeMeta part is working yet, but it certainly will be. Caltech Data can do this now, and they use the same underlying software as Zenodo. see https://twitter.com/CaltechData/status/972163704585269248

Oct 04 '18 20:10 danielskatz

Thanks for CodeMeta pointers. The CalTech example also helps make the picture clearer as well, its just a choice of what file the Zenodo instance is instructed to pick up, in CalTech's case like so https://github.com/caltechlibrary/dataset/blob/master/codemeta.json

Oct 04 '18 22:10 mrchristian

Caltech, please :)

Oct 04 '18 23:10 danielskatz

@mrchristian sorry for nagging on about this... any news on the creation and layout for a OSMOOC-specific .zenodo.json? Or can we adapt the one you mentioned earlier, from the sandbox example?

I guess starting with the built-in option would be great to get things going, and then evolve from that to future implementations such as the CalTech / codemeta.json - would that make sense?

Dec 09 '18 23:12 tosteiner

Module-5-Open-Research-Software-and-Open-Source Module-5-Open-Research-Software-and-Open-Source copied to clipboard

Repo metadata

Module-5-Open-Research-Software-and-Open-Source
Module-5-Open-Research-Software-and-Open-Source copied to clipboard