Does citation.cff override default Zenodo metadata? Should the docs state this somewhere?
Hi, thanks all so much for all the work developing citation.cff and the tooling around it.
I have adopted it in projects wherever possible.
My question is: does metadata in citation.cff override the "default" Zenodo metadata?
What I mean is:
I use the Zenodo-GitHub integration to generate DOIs.
Before adding a citation.cff file, the metadata associated with each record was correct.
It was automatically extracted (somehow, that I was able to remain blissfully ignorant of) when I created a release on GitHub.
I did not have any files in my projects like zenoodo.json that changed Zenoodo's behavior, if that matters.
After adding the citation.cff, I have realized that the metadata associated with every new record is what I have in my citation.cff. So for a couple of projects I have two or three releases that all have the exact same metadata, even though the .zip associated with each DOI is correct (and different!)
I am able to manually fix the metadata the Zenodo web UI.
This behavior makes sense, I guess, but was not expected, and I don't see it described anywhere in the docs. E.g. a statement in bold "Adding a citation.cff will override any defaults that Zenodo uses to generate metadata".
- Am I right that this is expected behavior?
- Should it be stated in some docs, somewhere? On the citation.cff page and/or Zenodo + GitHub?
Hi @NickleDave, and thanks for opening this issue.
From what I know, the pickup of metadata from CITATION.cff for Zenodo records via the GitHub Zenodo bridge works the way it was meant to.
I agree that there is a documentation gap for this feature in the GitHub Zenodo integration. I also think that this is something that should be documented most importantly on the Zenodo/GitHub side of things (as the integration is a shared feature between them). Mainly, IMHO the GitHub docs page explaining the feature, and the Zenodo FAQ, section GitHub (and perhaps even the developer docs explaining the use of .zenoso.json](https://developers.zenodo.org/#update-schedule)) should be adapted to explain this. Changes to the GitHub documentation can be proposed via PR, and it seems (judging by this PR) that the same should be possible for the Zenodo documentation as well.
As for documenting this behaviour on the CFF page, I think it'd be ideal to just link to existing documentation (on GitHub/Zenodo pages) in the future, but would consider adding a section about this to the CFF docs while such documentation doesn't exist.
Can I ask what you think where this information would be most helpful to provide, from your perspective as a user? I.e., where would you go and look for it?
Hi @NickleDave,
In addition to what Stephan said,
My question is: does metadata in citation.cff override the "default" Zenodo metadata?
Yes.
The GitHub-Zenodo integration does not look for a .zenodo.json anymore if your repository has a CITATION.cff. If both are absent, it uses the GitHub API to guess author names, license, the name of the software, etc. It looks like your repository https://github.com/NickleDave/vak was released in this way, for example if I look at https://zenodo.org/record/5732616 (vak v0.4.0b6) there are no keywords, the license is generic, the software name is constructed by concatenating the organization name + repo name + release title, and the author list is constructed by looking at the github contributors (not necessarily the same as the list of authors). In contrast, the metadata for https://zenodo.org/record/5809730 (vak v0.4.0) looks more complete and correct after a CITATION.cff was introduced.
Before adding a citation.cff file, the metadata associated with each record was correct.
I'd be interested to have a look at an example, I'm curious to see what's going wrong since adding a CITATION.cff, or at least what is different than expected.
Hope this helps, -Jurriaan
Thank you @sdruskat and @jspaaks for your detailed replies.
Can I ask what you think where this information would be most helpful to provide, from your perspective as a user? I.e., where would you go and look for it?
What you outlined makes the most sense to me: clearly state in GitHub and Zenodo docs that a citation.cff overrides Zenodo behavior, which is using a zenodo.json file or the GitHub API if no zenodo.json is present. Then in the citation.cff docs, link to those pages.
I'd be interested to have a look at an example, I'm curious to see what's going wrong since adding a CITATION.cff, or at least what is different than expected.
Let me try to explain what was unexpected.
Unfortunately, I no longer have examples because I fixed the metadata using the Zenodo web UI.
You are right @jspaaks that I am in the situation where Zenodo uses the GitHub API to scrape metadata.
I would bet that a lot of users are in this situation if they learned about Zenodo because of the GitHub integration, i.e. they just turned that feature on in GitHub.
What happens is:
- I make a new release on GitHub
- but I do not update my citation.cff
- because I am used to Zenodo getting metadata through the GitHub API
- Instead, the citation.cff over-rides it
- As a result, I have several releases that all have the same metadata, but a different DOI and a different archive file associated with them
- e.g. before I fixed the metadata manually, I had three versions of "vak 0.4.0b6", the one that was correctly tagged, and two additional versions that were actually 0.4.0 and 0.4.1
Please let me know if any of that's not clear. Happy to discuss further or help with docs contributions if you would like
Aha I see. The CITATION.cff was not updated between some releases, but it had a version string 0.4.0b6 in there, which was then used in not just 1 but 3 releases, the latter 2 of which were later updated by hand.
Possible solutions:
- update
CITATION.cffby hand every time before making a release. I resorted to adding some notes "How to make a release" in some of my projects because this approach can be error-prone and I find it helpful to just follow a list- Upside: metadata on Zenodo/associated with the doi is (hopefully) correct, but at least consistent with what the citation widget on GitHub shows
- Downside: Manual approach might bite you someday.
- it seems if you omit
versionaltogether, Zenodo uses the GitHub API to fill in (some) missing information. For example, I forkedvak, removed theversionkey and value fromCITATION.cff, enabled the GitHub-ZenodoSandbox integration, made a release with a deliberately odd number (5.67), which seems to have made it to Zenodo Sandbox (https://sandbox.zenodo.org/record/1034347).- Upside: metadata on Zenodo/associated with the doi is automatically correct
- Downside: Citation widget on GitHub does not include the version, CITATION.cff included with tar/zip from Zenodo does not include the version information
I realize none of these options are great but hopefully it helps to at least be aware of what they are.
Thanks for the clarification, -Jurriaan
Sure thing, thank you @jspaaks for going through all that effort.
Yes, I agree with how you have outlined the options.
Would be nice to have some support for different automation tools to address 1.
E.g., versioneer in Python land
This is definitely Python specific, but my plan is to write a script that does your (1) (so, not by hand) and then have a separate nox session that runs the script that I can execute when preparing a release
https://nox.thea.codes/en/stable/
Would be nice to have some support for different automation tools to address 1. E.g.,
versioneerin Python land
FYI: We're working on publication automation with metadata support in HERMES. Feeding back publication metadata into metadata files in the repo is part of the plan.
Good to know about, thank you @sdruskat. I am checking out the concept paper and will share it around.
Do you think there could be a similar workflow to publish to Zenodo? (Table 2 makes me realize there are many other existing workflows I'm not aware of.)
There's also a bit of a chicken-and-egg problem, that you must be aware of already.
I can update the CITATION.cff metadata by hand or script, and then publish to Zenodo, but that metadata includes a DOI on Zenodo, that of course won't exist until I publish it.
If I instead publish without updating the metadata, then I get a new DOI, but I have the same problem before where the CITATION.cff overrides the metadata from the GitHub API.
Not sure if this means there's something I'm failing to understand.
I take it your automated workflows for publishing are meant to address things like this.
Anyway:
I don't mean to keep adding to your GitHub notifications.
Please do let me know if there's something you want me to do to close this issue, or if you want to leave it open for now to track some of the discussion above. Happy to try to help with an issue / PR for the docs on GitHub + Zenodo linked above. Or just leave that to you all, whichever you prefer
It would be nice if CITATION.cff could be used to enrich instead of only replace the data auto-generated by GitHub. With CITATION.cff new contributors do not make it to Zenodo until someone updates the CITATION.cff file.