Release Counter Processor under gdcc, switch docs from CDLUC3
As discussed in standup this morning in the context of this issue...
- https://github.com/IQSS/dataverse.harvard.edu/issues/3
... we decided to fork https://github.com/CDLUC3/counter-processor to https://github.com/gdcc/counter-processor because the former had been switched to archive/read-only mode (which isn't great optics).
Next we need to put out a release of Counter Processor under the gdcc repo (our current docs are written to expect a release such as https://github.com/CDLUC3/counter-processor/archive/v0.1.04.tar.gz ).
Then we need to update the guides, basically a find and replace of CDLUC3 to gdcc. Something along these lines:
By the way, I noticed that we already had https://github.com/IQSS/counter-processor but I was only using it to make pull requests into the upstream CDLUC3 repo. I updated the README to point to the new gdcc repo.
Have you seen the reasons why they archived it? Are any of those true for us as well? Should we still use it?
On a related note: we could push the package to PyPI or GH packages for installation via pip. Would that help?
If we continue using it, I would also appreciate a container image to speed up usage and to freeze the dependencies.
In https://github.com/gdcc/counter-processor/commit/4c662cda0e69a78da2e36114dedd2c0cefa1c5d2 I reverted the reasons stated in the README. In short, the code was used by Dryad but not anymore now that they plan to use https://github.com/datacite/datacite-tracker instead. (We have concerns about datacite-tracker only tracking browser requests, not API calls.)
2025/01/21: Just pinging folks on this issue to understand next steps. @pdurbin What do we need to do to more this forward? Thanks!
@cmbz it's mostly a matter of prioritizing it. Then we make a release under gdcc and update the guides.
2025/01/21: Given that the issue already has a size, I am moving to Sprint Ready.
Looks like @stevenwinship already made a 1.0.5 release - is this issue just documentation now? Or were the docs already updated?
It looks like docs were mostly updated in this PR:
- https://github.com/IQSS/dataverse/pull/10479
I'll make a PR for the last few references:
% ack CDLUC source
source/_static/admin/counter-processor-config.yaml
36:robots_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/robot.txt
37:machines_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/machine.txt
source/_static/developers/counter-processor-config.yaml
34:robots_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/robot.txt
35:machines_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/machine.txt
36:robots_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/robot.txt 37:machines_url: https://raw.githubusercontent.com/CDLUC3/Make-Data-Count/master/user-agents/lists/machine.txt
Hmm, actually, should we copy these files from https://github.com/CDLUC3/Make-Data-Count/tree/master/user-agents/lists into https://github.com/gdcc/counter-processor and make another release? 🤔
Maybe we should. That repo hasn't been touched in 7 years. 😬
What do others think?
If we're fine leaving those robot.txt and machine.txt files where they are I believe it's fine to close this issue as done.
I'm moving the robot.txt and machine.txt
robots_url: https://raw.githubusercontent.com/IQSS/counter-processor/refs/heads/goto-gdcc/user-agents/lists/robots.txt machines_url: https://raw.githubusercontent.com/IQSS/counter-processor/refs/heads/goto-gdcc/user-agents/lists/machine.txt
I fixed the missing identifier field as described in: https://github.com/IQSS/dataverse/issues/11235
I have a few other fixes going into the next counter_processor release. These other issues were found while running the software in Harvard Dataverse Production
counter-processor 1.06 was released today with a few bug fixes
@stevenwinship great! To close this issue, should we make a PR to update the version in the guides?
As discussed at https://github.com/IQSS/dataverse/pull/11277#pullrequestreview-2642354590 the following PR is closing this issue so I'm removing it from the board:
- #11277