Synchronisation with software package metadata (e.g. Python)
Who needs the new tool?
People making software packages that they want to make citable e.g. Python packages.
What should the new tool enable users to do?
Users should be able to enter metadata in one place only and have it automatically kept in sync across different places where metadata can be stored in software packages.
What benefit would the new tool add?
Duplicate human entry of metadata e.g. in CFF, pyproject.toml, setup.cfg, setup.py (Python example) can lead to inconsistencies and uncertainty over what is the correct metadata.
Implementation suggestions
Adding functionality to cffconvert might be a good place to start looking. This could perhaps be extended to go from CFF to valid code "chunks" that would fit into pyproject.toml, setup.cfg, setup.py, etc. at the users preference. An example GH Actions could illustrate how synchronisation could be maintained, or maybe even a git hook to make this bit more platform agnostic.
Can you help?
- [x] Yes
- [ ] No
I Hope I haven't duplicated anything here. Sorry if I have. I'd be interested in your feedback @ns-rse, @willfurnass.
Hi @bobturneruk, FYI it seems @funkyfuture is thinking along similar lines (https://github.com/citation-file-format/cff-converter-python/issues/282).
ciao @bobturneruk, did you already have thought into the question whether the package metadata shall be fetched from a built package or from the pyproject.toml? i guess the latter needs to be read anyway for additional / overriding values in a [tool.<toolname>] table. afaict most build tools except for poetry support PEP-621 nowadays.
Thanks @jspaaks!
@funkyfuture - I had in mind that the metadata would be pulled from CFF into the files used to build the Python package, and to support the diverse range of places that this can be stored. I think your implied approach of deriving CFF from Python package data (in some way) may be better. I'll have a bit more of a read and a think. Sorry I missed your existing issue.
Maybe there's some way to use importlib.metadata and be agnostic to how the Python project stores its metadata in files?
pulling academic metadata into package metadata is imho very counter-intuitive and i would certainly not set that as a goal b/c it creates a long-term maintenance burden. to just have one one-to-one-mapping is actuallay what "inspired" me, btw two days after you opened this issue here. and my (and likely the PyPA's) intention is to establish the use of PEP-621; i don't assume that this poses a problem for at least 80% of the use-cases.
Yes. I think you're right that the source of the metadata should be the package.
I gather with PEP-621, metadata all goes in pyproject.toml. I would not expect rapid adherence to this amongst research software packages, given Python's history of lots of difference ways of doing packaging. But it does seem like the way forwards, and may be technically easier to make something that does pyproject.toml->CITATION.cff than trying to also support legacy metadata locations (e.g. setup.py).
okay, here's what i got in a first approach and i even like it: https://github.com/delb-xml/cff-from-621
i think it's enough to keep it aligned with the 80% simple use-cases in the long term.
any feedback is appreciated.
Great to see progress on this!
Reflecting on PEP 621 - I think the idea is that it provides rules on how to include metadata in pyproject.toml, rather than mandating that this is the place where metadata must ultimately go, so that could be more of an ongoing limitation than I'd thought. Happy to be corrected.
I'd also misunderstood cffconvert a bit, I think, in that it takes .cff as its sole input and generates a range of outputs, I'd thought it might also do the reverse. Not sure if this is planned for the future, but I see why @funkyfuture has built a separate tool.
rather than mandating that this is the place where metadata must ultimately go, so that could be more of an ongoing limitation than I'd thought. Happy to be corrected.
i can't exactly "correct" you, but point you to the Motivation section which to my understanding means that "the pyproject.toml is the place where metadata should go". and if i eventually decide to maintain the tool, that would certainly stay a precondition. what really is missing so far is what is mentioned as third point in said section of the PEP:
Allow for more code sharing between build back-ends for the “boring parts” of a project’s metadata
even setuptools by itself doesn't offer a simple interface to fetch all resolved metadata. the tool would certainly benefit from such library. but it's still a fresh seed in the excosystem, we'll see what grows out of it.
Maybe there's some way to use importlib.metadata and be agnostic to how the Python project stores its metadata in files?
i have no idea why i haven't looked into that. but one not so bad point against using it would be that this requires a package to be installed, which is overhead that you don't necessarily want in your publishing pipeline. nonetheless, i'd be interested in any insights here.
I'd also misunderstood cffconvert a bit
i can understand that one tool shouldn't try to solve too many problems. but sure, the name cffexport or something alike would be better suited.
fyi, i improved and released that prototype: https://pypi.org/project/cff-from-621/
For Rust projects, Aeruginous v3.5.0 can create a CITATION.cff from a project's Cargo.toml: https://crates.io/crates/aeruginous; https://github.com/kevinmatthes/aeruginous-rs.