jupiterbroadcasting.com icon indicating copy to clipboard operation
jupiterbroadcasting.com copied to clipboard

Scraper only commits to main

Open CGBassPlayer opened this issue 3 years ago • 10 comments

Currently the scraper only commits new episodes to main via the GitHub Action. I think we need the episodes to go into develop and main. I am drafting a PR and it would commit the 3 missing episodes to develop and I am not sure if this will cause issues when develop is merged into main in the future.

I am aware that this only affects contributors that forked the repo before the switch of develop being the main branch, but it is still something we need to think about.

My guess would be

  1. We update the action to commit to both develop and main
    • I'm not 100% sure about this one because I think that this could cause merge conflicts even if the commits are identical (not sure if git/GitHub is smart enough to see they are the same commit)
  2. We could make a branch for the scraped data and automatically create/merge PRs for both develop and main
    • It sounds a bit complicated, but it should relieve the concern I have about in idea 1

Definitely looking for some ideas on this one...

CGBassPlayer avatar Oct 04 '22 14:10 CGBassPlayer

Ya...I noticed this last night with @gerbrent 's PR. I hadn't been able to fix it yet but my plan was to try and do some type of GH action after the scrape to sync over the changes. But when thinking about it more this morning I realized it's a bit more complicated than that, because main could be missing some commits that the develop has...

Also number 1 I believer you're correct it either would cause conflicts or at minimum would probably duplicate the amounts of commits. Number 2 sounds like the most promising, but I personally don't know how to allow auto-merged PRs for a specific "type of commit" without turning it on for the whole repo: https://docs.github.com/en/github-ae@latest/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-auto-merge-for-pull-requests-in-your-repository

For now I've manually ran git pull --ff-only origin main and then git push on the develop branch. So it should be up to date with main now.

elreydetoda avatar Oct 04 '22 18:10 elreydetoda

I was thinking the exact same thing @elreydetoda

Could we have auto-merges part of the GitHub action? I see this action on the marketplace after a quick google search. and it got me thinking we could have the action scrape, create the PRs with an automerge label then immediately merge it.

I have no idea what our branch protection rules look like right now, but I will do a little more research to see what I can learn about handling this problem in the meantime

CGBassPlayer avatar Oct 04 '22 19:10 CGBassPlayer

That was what I was thinking was potentially via GH action, but I hadn't been able to look for one yet.

I know we could definitely have an action trigger after another one finishes (unfortunately we can't say only if it successfully finishes, but just if it finishes) like here:

https://github.com/JupiterBroadcasting/jupiterbroadcasting.com/blob/develop/.github/workflows/deploy-prod.yml#L14-L17

My biggest concern would be of the security of those GH actions (most seem to be nodejs based) and if the project is actively maintained or not (i.e. when was the last time the author responded to an issue/PR or committed).

elreydetoda avatar Oct 04 '22 20:10 elreydetoda

If we don't find a GitHub action that is still maintained or that isn't JS, maybe we can create our own?

I might look into this regardless because I've been thinking about learning GitHub Actions anyway. This would just give me a good excuse.

CGBassPlayer avatar Oct 04 '22 20:10 CGBassPlayer

@elreydetoda I see you are trying repo-sync. I was looking at that action as well. Have you had any luck with it?

CGBassPlayer avatar Oct 23 '22 19:10 CGBassPlayer

Ya I was looking at it, but honestly while there are possible actions for us out there I don't think there's going to be a good solution...the issue being that both branches are getting committed to simultaneously... all the actions are expecting either the downstream branch to be stagnant or automatically open PRs for them to be merged by someone...

I think there's a potential (once #475) is merged, that we could have PRs be opened and merged automatically (once we get our testing really solid :sweat_smile:). Then we could only have a PR reviewed from a human if there is a merge conflict or a test fails.

Till then, we could have an automation that just runs what I've been doing manually after develop is synchronized with main.

git checkout main && git pull && git checkout develop && git pull && git pull --ff-only origin main && git push

Which essentially syncs develop with main after I've merged develop into main.

elreydetoda avatar Nov 03 '22 17:11 elreydetoda

So looking into this a little bit, my first draft at an idea is to update the scrape action to have a few extra steps

    - name: 'Commit'
      uses: stefanzweifel/git-auto-commit-action@v4
      with:
        commit_message: 'Automatically scraped and committed via a GitHub Action.'
        repository: ./jbsite
        branch: scraping

    - name: 'Create PR for production'
      uses: devops-infra/[email protected]
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        source_branch: scraping
        target_branch: main
        title: Scrape data into main
        body: "** Automated PR **"
        label: scraper

    - name: 'Create PR for develop'
      uses: devops-infra/[email protected]
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        source_branch: scraping
        target_branch: develop
        title: Scrape data into develop
        body: "** Automated PR **"
        label: scraper

    - name: 'Merge scraper PRs'
      uses: devmasx/merge-branch@master
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        label_name: scraper

So basically the scrape would do the following:

  1. Commit into a branch called scraping
  2. Create 2 PRs to merge this branch into main and develop with a label (currently set to scraper)
  3. Merge PRs with with the scraper label

Looking for some feedback on this approach. I don't think we would really need to sync the scraping branch since it only handles scraped data so conflicts shouldn't really happen. But I haven't tested any of this. Just spit-balling here.

CGBassPlayer avatar Nov 13 '22 17:11 CGBassPlayer

good idea!

One worry would be auto-merges of scraper labelled PRs, since anyone could presumably tag a non-moderated PR and thus have it get merged on the next scrape without approval/knowledge. Could be an issue..

gerbrent avatar Nov 14 '22 21:11 gerbrent

then again, I think only mods can label issues, but still a consideration.

gerbrent avatar Nov 14 '22 21:11 gerbrent

One worry would be auto-merges of scraper labelled PRs...

I am only using scraper as an example. We can reserve a label just for this action if we wanted to.

then again, I think only mods can label issues, but still a consideration.

You are correct. I was only able to edit labels after getting triage permissions.

CGBassPlayer avatar Nov 14 '22 21:11 CGBassPlayer