citation-file-format icon indicating copy to clipboard operation
citation-file-format copied to clipboard

Synchronisation with software package metadata (e.g. Python)

Open bobturneruk opened this issue 3 years ago • 12 comments

Who needs the new tool?

People making software packages that they want to make citable e.g. Python packages.

What should the new tool enable users to do?

Users should be able to enter metadata in one place only and have it automatically kept in sync across different places where metadata can be stored in software packages.

What benefit would the new tool add?

Duplicate human entry of metadata e.g. in CFF, pyproject.toml, setup.cfg, setup.py (Python example) can lead to inconsistencies and uncertainty over what is the correct metadata.

Implementation suggestions

Adding functionality to cffconvert might be a good place to start looking. This could perhaps be extended to go from CFF to valid code "chunks" that would fit into pyproject.toml, setup.cfg, setup.py, etc. at the users preference. An example GH Actions could illustrate how synchronisation could be maintained, or maybe even a git hook to make this bit more platform agnostic.

Can you help?

  • [x] Yes
  • [ ] No

bobturneruk avatar Sep 09 '22 08:09 bobturneruk

I Hope I haven't duplicated anything here. Sorry if I have. I'd be interested in your feedback @ns-rse, @willfurnass.

bobturneruk avatar Sep 09 '22 08:09 bobturneruk

Hi @bobturneruk, FYI it seems @funkyfuture is thinking along similar lines (https://github.com/citation-file-format/cff-converter-python/issues/282).

jspaaks avatar Sep 12 '22 06:09 jspaaks

ciao @bobturneruk, did you already have thought into the question whether the package metadata shall be fetched from a built package or from the pyproject.toml? i guess the latter needs to be read anyway for additional / overriding values in a [tool.<toolname>] table. afaict most build tools except for poetry support PEP-621 nowadays.

funkyfuture avatar Sep 12 '22 06:09 funkyfuture

Thanks @jspaaks!

@funkyfuture - I had in mind that the metadata would be pulled from CFF into the files used to build the Python package, and to support the diverse range of places that this can be stored. I think your implied approach of deriving CFF from Python package data (in some way) may be better. I'll have a bit more of a read and a think. Sorry I missed your existing issue.

bobturneruk avatar Sep 12 '22 09:09 bobturneruk

Maybe there's some way to use importlib.metadata and be agnostic to how the Python project stores its metadata in files?

bobturneruk avatar Sep 12 '22 09:09 bobturneruk

pulling academic metadata into package metadata is imho very counter-intuitive and i would certainly not set that as a goal b/c it creates a long-term maintenance burden. to just have one one-to-one-mapping is actuallay what "inspired" me, btw two days after you opened this issue here. and my (and likely the PyPA's) intention is to establish the use of PEP-621; i don't assume that this poses a problem for at least 80% of the use-cases.

funkyfuture avatar Sep 12 '22 13:09 funkyfuture

Yes. I think you're right that the source of the metadata should be the package.

I gather with PEP-621, metadata all goes in pyproject.toml. I would not expect rapid adherence to this amongst research software packages, given Python's history of lots of difference ways of doing packaging. But it does seem like the way forwards, and may be technically easier to make something that does pyproject.toml->CITATION.cff than trying to also support legacy metadata locations (e.g. setup.py).

bobturneruk avatar Sep 13 '22 08:09 bobturneruk

okay, here's what i got in a first approach and i even like it: https://github.com/delb-xml/cff-from-621

i think it's enough to keep it aligned with the 80% simple use-cases in the long term.

any feedback is appreciated.

funkyfuture avatar Sep 18 '22 21:09 funkyfuture

Great to see progress on this!

Reflecting on PEP 621 - I think the idea is that it provides rules on how to include metadata in pyproject.toml, rather than mandating that this is the place where metadata must ultimately go, so that could be more of an ongoing limitation than I'd thought. Happy to be corrected.

I'd also misunderstood cffconvert a bit, I think, in that it takes .cff as its sole input and generates a range of outputs, I'd thought it might also do the reverse. Not sure if this is planned for the future, but I see why @funkyfuture has built a separate tool.

bobturneruk avatar Sep 21 '22 15:09 bobturneruk

rather than mandating that this is the place where metadata must ultimately go, so that could be more of an ongoing limitation than I'd thought. Happy to be corrected.

i can't exactly "correct" you, but point you to the Motivation section which to my understanding means that "the pyproject.toml is the place where metadata should go". and if i eventually decide to maintain the tool, that would certainly stay a precondition. what really is missing so far is what is mentioned as third point in said section of the PEP:

Allow for more code sharing between build back-ends for the “boring parts” of a project’s metadata

even setuptools by itself doesn't offer a simple interface to fetch all resolved metadata. the tool would certainly benefit from such library. but it's still a fresh seed in the excosystem, we'll see what grows out of it.

Maybe there's some way to use importlib.metadata and be agnostic to how the Python project stores its metadata in files?

i have no idea why i haven't looked into that. but one not so bad point against using it would be that this requires a package to be installed, which is overhead that you don't necessarily want in your publishing pipeline. nonetheless, i'd be interested in any insights here.

I'd also misunderstood cffconvert a bit

i can understand that one tool shouldn't try to solve too many problems. but sure, the name cffexport or something alike would be better suited.

funkyfuture avatar Sep 21 '22 15:09 funkyfuture

fyi, i improved and released that prototype: https://pypi.org/project/cff-from-621/

funkyfuture avatar Oct 29 '22 12:10 funkyfuture

For Rust projects, Aeruginous v3.5.0 can create a CITATION.cff from a project's Cargo.toml: https://crates.io/crates/aeruginous; https://github.com/kevinmatthes/aeruginous-rs.

kevinmatthes avatar Dec 23 '23 21:12 kevinmatthes