PSL-Infrastructure
PSL-Infrastructure copied to clipboard
Aggregate project citations from maintainer-curated documents
This issue builds on the discussion started in #181 about aggregating PSL project citations.
A brief overview of what @rickecon and @jdebacker have suggested so far:
- Each project curates a living document that lists places where the model is cited. @rickecon suggested that this doc could be a markdown file that pulls relevant citations from a separate, potentially more compressive,
references.bib
file. Alternatively, this document could be acitations.bib
file with just the relevant citations. - @jdebacker suggested creating a GH action in the PSL-Infrastructure repo that periodically collects the separate documents, aggregates them into a single file, and publishes it on pslmodels.org
@MattHJensen and I have curated a list of citations for (mostly) Tax Calculator in Zotero. Following suggestions from\ @rickecon and @jdebacker I will take a first stab at creating a references.bib
file for Tax Calculator that exports citations from Zotero in .bib
format regularly using a GH action.
@rickecon – Just to clarify, in #181, were you suggesting that the markdown file with each project's own citations (other people citing the project) is manually edited, or automatically updated every time the references.bib
file was updated?
FYI @chusloj I've put two .bib
files into the OG-USA repo:
-
references.bib
-- this has references that the OG-USA docs cite -
citations.bib
-- this has references of places where the OG-USA model was cited/referenced
You can see the References page here and the Citations page here (note that the citations are incomplete - I was just testing this way of doing things).
I see the following advantages of separate files for references and citations:
- Easily identifies the differences between the two types of references.
- PSL-Infrastructure can read the
citations.bib
files directly (rather than having to scrape markdown files) and then use BibTeX (as available in Jupyter-Book) to format the citations as the PSL-Infrastructure project deems fit (as opposed to scraping markdown where references may have a different format than what PSL-Infrastructure would like to present). - If we place the
citations.bib
files in the top-level directory of each PSL repo (as we have done in OG-USA), that makes it easy to create a script to find these files. You could put areferences.bib
in the top directory, but those files contain lots of references that are not as relevant to others looking through the repo. My opinion is that I'd rather project easily the list of places where the model has been used - rather than the list of references upon which the model draws. The later would be in the repo, but in subdirectories for documentation.
The drawback I see to separate references files is that there maybe some duplication across files -- e.g., some citations of the where OG-USA is used are to academic papers that we reference as places to look for further detail on the theory underlying the model, so these references are repeated in references.bib
and citations.bib
.
I think this can work well, but will be interested in what you and @rickecon think about its utility and implementability.
@jdebacker Thank you for this comment. How would you auto-update the citations.bib
file? I'm trying to pin down an automated way to do so.
@chusloj asks:
How would you auto-update the citations.bib file? I'm trying to pin down an automated way to do so.
The citations.bib
file in each repo would be the responsibility of the maintainers of that repo to keep up to date.
PSL-Infrastructure would have a file similar to citations.md
in the OG-USA repo. It could look something like:
# Citations and use cases of PSL Models
## Tax-Calculator
```{bibliography} https://github.com/PSLmodels/Tax-Calculator/blob/master/citations.bib
```
## OG-USA
```{bibliography} https://github.com/PSLmodels/OG-USA/blob/master/citations.bib
```
## PCI-China
```{bibliography} https://github.com/PSLmodels/PCI-China/blob/master/citations.bib
```
.
.
.
.
Of course, you could also put the citations on separate pages or insert additional content between the lists of citations.
With a file or files like this, PSL-Infrastructure would have a GH Action that compiles this citations.md
(or the several of them) file each night, rending it as HTML, and pushing to the PSL-Infrastructure host (e.g., GH-pages).
I think this should work, but I haven't tried it and maybe missing something in these steps.
Here's an idea. A new citations.html
page can be created been created that lists each project as a hyperlink, where the hyperlink re-directs a user to the citations page on each project's Jupyter Book documentation site.
To use Tax-Calculator as an example, I'm thinking the following:
- Scrape the whole
.bib
file of citations for Tax-Calculator from Zotero using something similar to the following:
curl -H 'Zotero-API-Version: 2' -H 'Zotero-API-Key: <key>' 'https://api.zotero.org/users/6708260/items?format=bibtex'
- Use Launchd, the same program that @Peter-Metz uses to update the
PSL_catalog.json
file daily, to scrape this.bib
file regularly (probably daily) and push it to the Tax-Calculator Jupyter Book docs – The.bib
file renders automatically as a page with formatted citations which @jdebacker shows in #186. - The GH action which builds the JB docs daily will take care of the rest.
Alternatively, a new "citations" link under each project on the catalog
page could be created which re-directs a user to that project's citations on its Jupyter Books documentation site.
I've tried looking around available GitHub Actions for the ability to download files via curl
, but nothing panned out. I'm not well-versed in GH Actions so I welcome any suggestions for GH actions that fit this use case.
@MattHJensen
@chusloj that work flow for Tax-Calculator sounds promising -- it might be worth opening an issue in the tax-calc repo.
In my view, the downside of listing hyperlinks is that it would require projects to create docs websites, and that's not a requirement for PSL inclusion. Also, it would generally be useful to collect citations in a single document as @jdebacker suggested. I'd be very happy to participate in the development of a tool that does this.
@Peter-Metz Thanks for your input. Aside from creating a Jupyter Books page that can automatically format citations, we can make a markdown file that cites each of the references in a .bib
file and uses pandoc to auto-format the citations, but that markdown file would have to be manually updated each time a new reference is added. The new development tool you suggest could use something such as this to automatically write markdown files.
The discussion for Tax-Calculator citations specifically has been continued at Tax-Calculator#2470.
In https://github.com/PSLmodels/PSL-Infrastructure/issues/186#issuecomment-676571308, @jdebacker suggested listing citations by project.
That raises the question, how would works that rely on several projects be included? For example, most projects using OG-USA also rely on Tax-Calculator and TaxData. Would such a work appear three times in the PSL-infrastructure citations document?
Perhaps that's the best place to start while projects are putting together these citations docs initially, but over time we may want to adopt some SHOULD
style guidelines so that we can easily identify common citations, list each citation once, and include tags or similar for the PSL projects they rely on. E.g., a prettified version of:
WORK CITING PSL BIB INFO [TaxData][Tax-Calculator][OG-USA]