Generate citations file for pipeline
Is your feature request related to a problem? Please describe. This is a request for a feature which generates a list of citations for all the tools used in a workflow.
Describe the solution you'd like It would be nice if given a Snakemake file I could generate a file (bibtex for example) containing citations for all the tools used.
Describe alternatives you've considered The alternative is to manually look through the rules in a workflow and find the appropriate citations. This is tedious and potentially error prone.
Additional context In the case of wrappers where wrapper authors sometimes include the citation in the meta.yml file, this information might be fairly easy to extract.
Hi @thomasmulvaney I had this idea/requirement too once in a while. But maybe this feature is too much out of the scope for the main snakemake code base. For me this sounds more like something for workflow templates, or an independent tool that parses and extracts information from snakemake workflows. Can I suggest something?
Would you mind closing this issue here and instead copy paste it at the workflow catalog? We could implement a script that will automatically try to parse docs or yaml files and extract links to used software, their versions, and a citation. This could be displayed on the individual workflow pages of standardized workflows for example.
Hi!
I saw this issue and was really inspired to work on a project that was able to support generating citations for dependencies used in a Snakemake workflow which resulted in Snakecite.
Would you still like this issue copied over to the workflow catalog?
Hi, yes why not! I will test this as soon as I find time.
I just tested your repo, it's a good start!
I installed it locally and tested it on different environment definition yaml files in workflow/envs/<def>.yml.
I had mixed results though. For some yml files, it ran infinitely searching for something, for example for an env with package samtools.
For other packages e.g. r-base, I got this error message.
No available links for r-base
Traceback (most recent call last):
File "./snakecite/src/snakecite/__main__.py", line 89, in <module>
main()
File "./snakecite/src/snakecite/__main__.py", line 55, in main
if cite.is_doi_url(link):
^^^^^^^^^^^^^^^^^^^^^
File "./snakecite/src/snakecite/cite.py", line 29, in is_doi_url
if re.match(doi_regex, doi):
^^^^^^^^^^^^^^^^^^^^^^^^
File "./miniforge3/lib/python3.12/re/__init__.py", line 167, in match
return _compile(pattern, flags).match(string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'
I think we could fix the errors above to make it more robust, but what worries me most is the time needed for the web searches. If we want to use this for the workflow catalog, retrieval would need to be within a couple of seconds for all software packages in a project -- something that is challenging because it depends a lot on external servers.
This is of course not relevant for people using it in standalone way, then timing does not play a role really.
Thanks for taking the time to test it out! I am very aware of the time to generate citations and am currently looking into working in parallel requests to speed this process up considerably. It may not be as pressing of an issue in a standalone execution, but it would absolutely be a QOL improvement!