rweekly.org icon indicating copy to clipboard operation
rweekly.org copied to clipboard

GH Action to gather content

Open jonocarroll opened this issue 1 year ago • 4 comments

The curation process as it stands currently involves:

  1. collecting the last 10 days of RSS entries via get_rss_posts()
  2. collecting the last week or so of CRANberries new and updated via process_cranberries()
  3. de-duplicating (within the draft and from the last 20 issues)
  4. adding content found elsewhere
  5. curating posts - filtering out irrelevant/low-quality content and categorising

I believe the first 3 of these can be automated, potentially with a GitHub Action, performed on a weekly schedule. Getting that content into the draft itself is a minor addition, but collecting that content in the first place, even into a committed plaintext file, could help editors get closer to a draft, faster.

I think I'm able to prototype this myself, but this issue can serve as a place for discussion about improvements or concerns.

jonocarroll avatar Oct 21 '23 01:10 jonocarroll

The prototype works! https://github.com/rweekly/rweekly.org/blob/gh-pages/curatinator_latest.md?plain=1 (I forgot to add linebreaks, but the concept is sound).

I'll add to this the collection of CRANberries and de-duplication. It's set to run at 9am Saturday UTC each week, but can also be triggered manually in the Actions tab on github.

jonocarroll avatar Oct 21 '23 02:10 jonocarroll

I'm quite happy with that! This now fetches the RSS feeds and CRANberries, de-duplicates, and saves to curatinator_latest.md for copying over to the draft. Still requires the deup from past issues but I wasn't sure how to easily excise those.

jonocarroll avatar Oct 21 '23 02:10 jonocarroll

This is really nice @jonocarroll! I've felt a sense of discontent each time I've curated since the loss of our infrastructure but lacked the initiative to do something about it, so I'm really appreciative of this step to remove some of the inefficiency in our process.

Doing it via a GH action is also really nice since it keeps it transparent for everyone and facilitates maintenance / collaboration / iteration.

jonmcalder avatar Oct 21 '23 08:10 jonmcalder

Looks great to me! This will save me some time during my curation weeks.

tonyelhabr avatar Oct 24 '23 19:10 tonyelhabr