Scraper only commits to main
Currently the scraper only commits new episodes to main via the GitHub Action. I think we need the episodes to go into develop and main. I am drafting a PR and it would commit the 3 missing episodes to develop and I am not sure if this will cause issues when develop is merged into main in the future.
I am aware that this only affects contributors that forked the repo before the switch of develop being the main branch, but it is still something we need to think about.
My guess would be
- We update the action to commit to both
developandmain- I'm not 100% sure about this one because I think that this could cause merge conflicts even if the commits are identical (not sure if git/GitHub is smart enough to see they are the same commit)
- We could make a branch for the scraped data and automatically create/merge PRs for both
developandmain- It sounds a bit complicated, but it should relieve the concern I have about in idea 1
Definitely looking for some ideas on this one...
Ya...I noticed this last night with @gerbrent 's PR. I hadn't been able to fix it yet but my plan was to try and do some type of GH action after the scrape to sync over the changes. But when thinking about it more this morning I realized it's a bit more complicated than that, because main could be missing some commits that the develop has...
Also number 1 I believer you're correct it either would cause conflicts or at minimum would probably duplicate the amounts of commits. Number 2 sounds like the most promising, but I personally don't know how to allow auto-merged PRs for a specific "type of commit" without turning it on for the whole repo: https://docs.github.com/en/github-ae@latest/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-auto-merge-for-pull-requests-in-your-repository
For now I've manually ran git pull --ff-only origin main and then git push on the develop branch. So it should be up to date with main now.
I was thinking the exact same thing @elreydetoda
Could we have auto-merges part of the GitHub action? I see this action on the marketplace after a quick google search. and it got me thinking we could have the action scrape, create the PRs with an automerge label then immediately merge it.
I have no idea what our branch protection rules look like right now, but I will do a little more research to see what I can learn about handling this problem in the meantime
That was what I was thinking was potentially via GH action, but I hadn't been able to look for one yet.
I know we could definitely have an action trigger after another one finishes (unfortunately we can't say only if it successfully finishes, but just if it finishes) like here:
https://github.com/JupiterBroadcasting/jupiterbroadcasting.com/blob/develop/.github/workflows/deploy-prod.yml#L14-L17
My biggest concern would be of the security of those GH actions (most seem to be nodejs based) and if the project is actively maintained or not (i.e. when was the last time the author responded to an issue/PR or committed).
If we don't find a GitHub action that is still maintained or that isn't JS, maybe we can create our own?
I might look into this regardless because I've been thinking about learning GitHub Actions anyway. This would just give me a good excuse.
@elreydetoda I see you are trying repo-sync. I was looking at that action as well. Have you had any luck with it?
Ya I was looking at it, but honestly while there are possible actions for us out there I don't think there's going to be a good solution...the issue being that both branches are getting committed to simultaneously... all the actions are expecting either the downstream branch to be stagnant or automatically open PRs for them to be merged by someone...
I think there's a potential (once #475) is merged, that we could have PRs be opened and merged automatically (once we get our testing really solid :sweat_smile:). Then we could only have a PR reviewed from a human if there is a merge conflict or a test fails.
Till then, we could have an automation that just runs what I've been doing manually after develop is synchronized with main.
git checkout main && git pull && git checkout develop && git pull && git pull --ff-only origin main && git push
Which essentially syncs develop with main after I've merged develop into main.
So looking into this a little bit, my first draft at an idea is to update the scrape action to have a few extra steps
- name: 'Commit'
uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_message: 'Automatically scraped and committed via a GitHub Action.'
repository: ./jbsite
branch: scraping
- name: 'Create PR for production'
uses: devops-infra/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
source_branch: scraping
target_branch: main
title: Scrape data into main
body: "** Automated PR **"
label: scraper
- name: 'Create PR for develop'
uses: devops-infra/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
source_branch: scraping
target_branch: develop
title: Scrape data into develop
body: "** Automated PR **"
label: scraper
- name: 'Merge scraper PRs'
uses: devmasx/merge-branch@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
label_name: scraper
So basically the scrape would do the following:
- Commit into a branch called scraping
- Create 2 PRs to merge this branch into main and develop with a label (currently set to scraper)
- Merge PRs with with the scraper label
Looking for some feedback on this approach. I don't think we would really need to sync the scraping branch since it only handles scraped data so conflicts shouldn't really happen. But I haven't tested any of this. Just spit-balling here.
good idea!
One worry would be auto-merges of scraper labelled PRs, since anyone could presumably tag a non-moderated PR and thus have it get merged on the next scrape without approval/knowledge. Could be an issue..
then again, I think only mods can label issues, but still a consideration.
One worry would be auto-merges of scraper labelled PRs...
I am only using scraper as an example. We can reserve a label just for this action if we wanted to.
then again, I think only mods can label issues, but still a consideration.
You are correct. I was only able to edit labels after getting triage permissions.