doc icon indicating copy to clipboard operation
doc copied to clipboard

backup GitHub issues, comments, etc.

Open justinmk opened this issue 6 years ago • 2 comments

Potential tools:

  1. ~~https://github-backup.branchable.com/~~ doesn't look maintained
    • https://hackage.haskell.org/package/github-backup
    • used by debian: https://github.com/Debian/README.Debian#tips
  2. https://github.com/josegonzalez/python-github-backup good results 👍

justinmk avatar Oct 08 '17 16:10 justinmk

I would like to know how exactly I would be able to solve this issue. Guidance would be appreciated as I am new to this.

sarkararpan710 avatar Apr 08 '19 10:04 sarkararpan710

The post above links to a potential tool that could be used. The task is to investigate how to use that tool (or some alternative), and then writing a script that uses it.

justinmk avatar Apr 08 '19 10:04 justinmk

@justinmk where should this be stored? I would suggest a GitHub Release.

tsukinoko-kun avatar Jul 13 '23 15:07 tsukinoko-kun

@Frank-Mayer I would expect the data to be stored in a git repo.

justinmk avatar Jul 14 '23 16:07 justinmk

@justinmk one backup-repo to back up multiple other repositories? Or one backup branch in one repository?

The first option could be useful if there are multiple different repos to back up.

tsukinoko-kun avatar Jul 15 '23 18:07 tsukinoko-kun

I would prefer the first option.

One repository neovim/backup with one directory for each neovim repository that should get a backup.

Using a GitHub action triggered on a cron shedule (maybe each week) these backups can be updated.

I created a test repository for this approach: https://github.com/Frank-Mayer/backup

tsukinoko-kun avatar Jul 15 '23 18:07 tsukinoko-kun

Nice! Looks good to me. Don't want separate branches, only separate directories as you did.

I think these are requirements:

  1. ~~The repos should be explicitly chosen, i.e. we don't want to implicitly backup all repos.~~ (Edit: a way to exclude noisy repos would be useful.)
  2. The cron job should be very friendly to github's API
    • Only incremental changes should be pulled.
  3. Ignore anything newer than 1 week (1 month?). We want to avoid storing "edit history".
    • Only pull the "latest" version of a comment, not its history (assuming github API even offers that)
  4. Repo size should be not too big, hopefully much less than 1 GB.
    • Don't store images/videos.
    • Other ideas?

justinmk avatar Jul 15 '23 20:07 justinmk

  1. The repos should be explicitly chosen, i.e. we don't want to implicitly backup all repos. (where is that specified, I don't see it in your gha job?)

I currently back up all repositories of the neovim organisation. I could provide a list, this is not a problem. I would suggest neovim, go-client, node-client, nvim-lspconfig, pynvim, nvim.net.

  1. The cron job should be very friendly to github's API

    • Only incremental changes should be pulled.

Then I will set the cron to once a month. Incremental changes are active.

  1. Ignore anything newer than 1 week (1 month?). We want to avoid storing "edit history".

    • Only pull the "latest" version of a comment, not its history (assuming github API even offers that)

I don't think "ignore anything newer than 1 week (1 month?)" is possible with this. But I am looking into the tools and hopefully find a way to do this.

  1. Repo size should be not too big, hopefully much less than 1 GB.

    • Don't store images/videos.
    • Other ideas?

My test repository currently takes 7.8 MB.

tsukinoko-kun avatar Jul 16 '23 09:07 tsukinoko-kun

I could provide a list, this is not a problem. I would suggest ...

After thinking more, maybe an explicit list isn't needed. Because the data for most repos will be very small, they don't have many issues/PRs. But if there's a way to exclude a repo that may be needed. E.g. https://github.com/neovim/winget-pkgs is something we wouldn't want to backup, although it doesn't use PRs so even its data is small.

I don't think "ignore anything newer than 1 week (1 month?)" is possible with this

Could be a TODO. I would guess we could add it as a feature to https://github.com/josegonzalez/python-github-backup , or worst case, we could parse the JSON and filter our stuff manually before git-committing it.

My test repository currently takes 7.8 MB.

Extrapolating to 10k issues, I'm guessing the full data will approach 1+ GB. This is not a blocker, but as a TODO we could think about ignoring some kinds of PRs and issues. E.g. vim-patch PRs could possibly be dropped.

justinmk avatar Jul 16 '23 09:07 justinmk

I made the suggested changes as far as I am able to do so with the given tools.

I would suggest transferring Frank-Mayer/backup to the Neovim organization.

With the current possibilities of github-backup I don't see a possibility of excluding vim-patch PRs. I would add this as a TODO. This could be added by a PR or fork to github-backup.

Maybe you know this @justinmk: I am uncertain whether secrets get transferred with the repository or not. A secret called PAT is expected. This is required to call the GitHub API.

tsukinoko-kun avatar Jul 22 '23 11:07 tsukinoko-kun

Ok, thanks! Will look for a transfer request. Let's see how it goes.

justinmk avatar Jul 24 '23 15:07 justinmk

Well, the plan doesn't seem to work. 😅 Screenshot 2023-07-24 at 21 18 03

tsukinoko-kun avatar Jul 24 '23 19:07 tsukinoko-kun

I don’t have the permission to transfer the repository @justinmk

I could transfer it to you, and you add it to neovim. Or you fork it to neovim.

(As far as I know, if you fork it, you have to enable GitHub Actions.)

tsukinoko-kun avatar Jul 24 '23 19:07 tsukinoko-kun

Try transferring it to me

Edit: it's here now: https://github.com/neovim/neovim-backup

justinmk avatar Jul 25 '23 12:07 justinmk

Thanks again for getting this started! Can continue iterating at https://github.com/neovim/neovim-backup

justinmk avatar Jul 25 '23 23:07 justinmk