Sync issues, PRs, labels, milestones, comments from GitHub mirrors
This PR adds support for one-way syncing of topics, milestones, labels, issues, pull requests, comments, reviews, releases, and reactions from pull mirror (GitHub) to Gitea. Closes #18369.
Downloader interface is extended with GetNewIssues, GetNewPullRequests, since new issues and pull requests should be appended/updated to the existing repository.
Uploader interface is extended with UpdateTopics, UpdateMilestones, UpdateLabels, PatchReleases, PatchComments, PatchIssues, and PatchPullRequests. All Update* should replace existing data and Patch* should append/update but should not delete existing data.
The above methods will be implement here for GitHub, ~or possibly also GitLab,~ in this pull request.
In order to make the updates work, we would need to uniquely identify a Topic, Milestone, Label, Release, Comments, Issue and PullRequest. The following list shows how they are identified:
- [x] Topic - Name (it's just a string, unrelated to any other things)
- [x] Milestone - OriginalID*
- [x] Label - OriginalID*
- [x] Release - OriginalID*
- [x] Comments - OriginalID*
- [x] Issue - IssueID and Index
- [x] PullRequest - IssueID and Index
- [x] Reviews - OriginalID*
- [x] Reactions - IssueID, CommentID and Type
OriginalID*: additional column added to database, used to track ID from the source
Screenshot of toggles to set which migration Items should be synced (the only change is now when mirror option is checked, the migration items are still available for supported platforms):
You need to implement UpdateIssues, UpdateReactions, UpdateComments and etc. functions. You need to find the unique external id to update them.
Codecov Report
Merging #20311 (dd20d7d) into main (f521e88) will decrease coverage by
0.43%. The diff coverage is20.20%.
:exclamation: Current head dd20d7d differs from pull request most recent head 8fb9956. Consider uploading reports for the commit 8fb9956 to get more accurate results
@@ Coverage Diff @@
## main #20311 +/- ##
==========================================
- Coverage 47.14% 46.71% -0.43%
==========================================
Files 1149 994 -155
Lines 151446 137597 -13849
==========================================
- Hits 71397 64285 -7112
+ Misses 71611 65414 -6197
+ Partials 8438 7898 -540
| Impacted Files | Coverage Δ | |
|---|---|---|
| models/repo/mirror.go | 66.07% <ø> (ø) |
|
| modules/migration/null_downloader.go | 23.40% <0.00%> (-8.03%) |
:arrow_down: |
| modules/migration/null_uploader.go | 0.00% <0.00%> (ø) |
|
| modules/notification/notification.go | 93.02% <ø> (+10.01%) |
:arrow_up: |
| routers/api/v1/repo/migrate.go | 53.43% <ø> (+2.24%) |
:arrow_up: |
| routers/web/repo/migrate.go | 40.72% <ø> (+1.91%) |
:arrow_up: |
| services/migrations/dump.go | 39.83% <ø> (+0.55%) |
:arrow_up: |
| services/migrations/migrate.go | 30.01% <0.00%> (-18.85%) |
:arrow_down: |
| models/migrate.go | 17.34% <4.26%> (-28.64%) |
:arrow_down: |
| models/issues/review.go | 48.31% <15.00%> (-4.42%) |
:arrow_down: |
| ... and 7 more |
... and 1146 files with indirect coverage changes
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@harryzcy this looks really promising! What's the current state of this?
It's on hold since I discovered PR #20122, which is basically doing the same thing. I don't know the status of that PR. So if there's an interest from the community in this PR, I can resume the work. @tecosaur
@earl-warren proposes a new approach https://github.com/go-gitea/gitea/pull/20122#issuecomment-1236438810 and the PR #20122 you mentioned above is now closed. How do you see it @harryzcy in the near future?
@earl-warren proposes a new approach #20122 (comment) and the PR #20122 you mentioned above is now closed. How do you see it @harryzcy in the near future?
I will pick it up sometime.
Update: I'll be working on it in Apr. ~Probably I can make it before 1.20 feature freezes.~
any timeline on that feature? Would really appreciate it
This PR is mostly complete and ready for reviews. There maybe some minor issues I'm looking into (updated release doesn't have the right database record, some UI buttons should be disabled, etc.), but that won't change the codebase much.
I'm a bit afraid to ask because I don't completely understand the changes, but what will happen to references of issues and comments?
As I understand, if between 2 synchronizations a new issue is made both in the local copy of the repository and also at the remote one (e.g. Github), then after the next syncronization the new issue of the remote side will have a different ID locally than there, because they are appended to the list of existing ones. This would mean that if that issue was referenced (by its ID, because thats how it's done) in an other issue, in a PR, in a commit, or somwehere else, then that reference would now point to a different issue than originally intended. This actually also applies to PRs and comments, as they all have serially incremented IDs (and for comments, counting is done globally on the instance level afaik, not on the repository level).
Basically, what happens if there are objects at the remote repository that have the same IDs as some local obejcts? If they get a new ID (because they are appended to existing ones or for an other reason), are references somehow corrected, or will they just become silently broken?
@mpeter50 No, two way synchronization is not supported in this PR. It is expected that users doesn't change anything on Gitea that's synced from GitHub, otherwise things will just get broken or get replaced.
Probably some resources should be made read-only, i.e. disable the buttons on the web and return 400 for APIs.
Or even better, we should start storing unique IDs from the remote (GitHub) in this PR so that it will be much easier to enable two-way syncs in the future.
Since this PR will get merged only after 1.20 release, there's more time to discuss about these to avoid premature implementations.
Or even better, we should start storing unique IDs from the remote (GitHub) in this PR so that it will be much easier to enable two-way syncs in the future.
As a fellow Gitea hoster, I completely agree with this idea. Currently, I also use Gitea as a backup and mirror solution from Github to Gitea. For any important free software project, I rely mostly on Github for now while keeping a full backup/mirror on Gitea. Thus, having a two-way sync feature would be fantastic but still falls behind the priority of having a backup mirror.
Ideally, I would like to migrate entirely from Github while still giving the option for Github users who are not ready yet to opt-in. This approach would require a two-way synchronization feature.
I have picked up this PR recently. Some of the pending tasks:
- when comment is updated on GitHub, the updated time in Gitea is not updated (probably have the same issue with other things)
- release can't be correctly updated
Some non blocking issues:
- comment history from GitHub not imported
I'm still trying to figure out migration related code. Looks like some code doing duplicated code, and some refactoring to migration code in general (not only this feature) might be needed. Maybe job for another PR after this one.
It's ready to review now?
@lunny almost.. almost.. let me go over everything again to see if anything is missing. I'll give an update later this week. I do want to finish it before 1.21 feature freezes.
You don't have to push commits one by one. I think its fine e.g. once at the end of the day, or when you are waiting for review.
Also, sometimes it might be useful to squash multiple commits into a single one. This happens with the rebase function of git, but if you are not familiar with it, you dont have to do it.
@lunny It's now ready for review
@lunny It's now ready for review
Hi, lint error need to be fixed.
So what is the status? Is there any chance that it will be ready for 1.22?
So what is the status? Is there any chance that it will be ready for 1.22?
Waiting for reviews
@delvh Would you be able to take a look at this PR again? Thanks
Does this include mirroring release packages as well? Right now Gitea only mirrors the tarballs, and any artifacts (like .appimages) added to GitHub releases are lost.
Does this include mirroring release packages as well? Right now Gitea only mirrors the tarballs, and any artifacts (like .appimages) added to GitHub releases are lost.
Yes, included in this PR
This PR looks cool, I'll try a build from this branch locally to see if this fixes our needs for having a mirror set up for Github repositories that also syncs, issues, comments on issues and PRs. Thanks @harryzcy Looking forward to this PR getting vetted and merged in the future.
@siddarthkay Thank you for helping testing this out.
hmm @harryzcy : I used your branch url as the source of my docker image and now I see that the options to select Issues, PRs while setting up a mirror are disabled.
Is it intended to be that way?
Here is my docker-compose.yml Incase it helps to replicate the issue.
version: "3"
networks:
gitea:
external: false
services:
server:
build:
context: https://github.com/harryzcy/gitea.git#sync-issue-pr-and-more
dockerfile: Dockerfile
container_name: gitea
environment:
- USER_UID=1000
- USER_GID=1000
- GITEA__database__DB_TYPE=mysql
- GITEA__database__HOST=db:3306
- GITEA__database__NAME=gitea
- GITEA__database__USER=gitea
- GITEA__database__PASSWD=gitea
restart: always
networks:
- gitea
volumes:
- ./gitea:/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- "3000:3000"
- "222:22"
depends_on:
- db
db:
image: mysql:8
restart: always
environment:
- MYSQL_ROOT_PASSWORD=gitea
- MYSQL_USER=gitea
- MYSQL_PASSWORD=gitea
- MYSQL_DATABASE=gitea
networks:
- gitea
volumes:
- ./mysql:/var/lib/mysql
Thanks so much for your work on this PR! I've set it up to attempt to see if it'll work for us.
I imported a private repository using a Personal Access Token. The initial sync works fine but I'm encountering this error on subsequent syncs:
gitea-1 | 2024/05/24 23:02:23 ...eb/routing/logger.go:102:func1() [I] router: completed GET /ORGNAME/REPONAME/settings for ...:0, 200 OK in 15.1ms @ setting/setting.go:95(setting.Settings)
db-1 | 2024-05-24 21:02:24.317 UTC [35] ERROR: duplicate key value violates unique constraint "issue_index_pkey"
db-1 | 2024-05-24 21:02:24.317 UTC [35] DETAIL: Key (group_id)=(1452) already exists.
db-1 | 2024-05-24 21:02:24.317 UTC [35] STATEMENT: INSERT INTO issue_index (group_id, max_index) VALUES ($1, $2)
db-1 | 2024-05-24 21:02:24.317 UTC [35] ERROR: current transaction is aborted, commands ignored until end of transaction block
db-1 | 2024-05-24 21:02:24.317 UTC [35] STATEMENT: SELECT max_index FROM issue_index WHERE group_id=$1
gitea-1 | 2024/05/24 23:02:24 ...irror/mirror_pull.go:465:runSyncMisc() [E] runSyncMisc [repo: <Repository 1452:ORGNAME/REPONAME>]: failed to run SyncRepository: pq: current transaction is aborted, commands ignored until end of transaction block
Hey @harryzcy are you actively working on this PR to sync pull request data from github->gitea, or are you looking for someone to pick up the torch? My company is interested in this feature, and I could: a) develop off this branch and keep your commits, then open a new pr that has your commits + whatever fixes I needed b) advocate we merge some portion of this, like schema changes and client/api/types without implementations and then I develop off of that c) abandon this branch and close this (now draft) PR
This was bumped from 1.23 release goals to 1.24, I'd like to pitch in to make that happen but also don't want to step on your toes if you've got this.
Without wanting to sound impatient, given the time that has passed since this PR was last active, and that @harryzcy appears to have been inactive on this project for a significant period, it would be beneficial to clarify the next steps. To help maintain momentum on this feature.