clarin-dspace
clarin-dspace copied to clipboard
Github (and more?) code archiving
Take MorphoDiTa as an example. How do we show it actually has data (source and binaries) on Github?
Need to integrate Dspace+Github+UFAL (or other) project web
https://github.com/mozillascience/fidgit
https://www.mozillascience.org/projects/codemeta
Today I was in a meeting of a project that creates corpora and wants to publish them when finished. They chose to publish them at Zenodo, because it has this feature. Use case: corpus in progress in a VCS system at Github. Releases of the corpus pushed into a repository.
It is not the first time I heard this mentioned as "killer feature", maybe because it is advertised by Github itself. https://guides.github.com/activities/citable-code/ Anybody wants to get us into that page too?
Take MorphoDiTa as an example. How do we show it actually has data (source and binaries) on Github?
Need to integrate Dspace+Github+UFAL (or other) project web
fwiw, we added a 'custom' Is Based On field for our corpus:
- https://clarin.eurac.edu/repository/xmlui/handle/20.500.12124/6
which then links to a git(lab) tag - where the state of the repository and download link correspond to the archived version(s).
there is nothing automated about it, and even the Is Based On needs to be filled in after an initial commit, but it connects the information we found belongs together.
Hi Egon. Thanks for sharing this.
My perspective is this: It certainly works and is better than nothing. However I don't very much like the semantics, because to me it suggests this item is something else than those it is based on, a new thing based on those things.
I really like the Zenodo-Github way. It doesn't suggest it is different, it explicitly says it is the same by using the same PID. It also works very importantly to show the PID in the code repository to help people cite that code.
Have you considered doing an integration like that with your GitLab? That would be very cool.
this would be very cool indeed! but our solution was merely a pragmatic one (so far anyway) where we re-used an available dc.field... so, for this to be more sophisticated(tm) we would need:
- a way to generate the link/picture of the PID to be included somewhere else (e.g. a README.md file in a github repository...)
- a semantically less opaque field (which would probably need to be a new one)
Maybe https://www.sara-service.org/ (https://demo.sara-service.org, https://github.com/sara-service) could be a source of inspiration
ok, this is inspirational - but also quite heavy... here, i was more thinking along the lines of let's suppose, we have an insanely magical black box that produces 'software archive units' with content like, for example
- git archive (clone / snapshot)
- docker images
- other things needed to produce (and describe how) the content of a clarin-dspace deposit
in what way could/should clarin-dspace make available this information within in each deposit?
...but maybe i was thinking 'too small' ;)