gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Sync issues, PRs, labels, milestones, comments from GitHub mirrors

Open harryzcy opened this issue 3 years ago • 27 comments

This PR adds support for one-way syncing of topics, milestones, labels, issues, pull requests, comments, reviews, releases, and reactions from pull mirror (GitHub) to Gitea. Closes #18369.

Downloader interface is extended with GetNewIssues, GetNewPullRequests, since new issues and pull requests should be appended/updated to the existing repository.

Uploader interface is extended with UpdateTopics, UpdateMilestones, UpdateLabels, PatchReleases, PatchComments, PatchIssues, and PatchPullRequests. All Update* should replace existing data and Patch* should append/update but should not delete existing data.

The above methods will be implement here for GitHub, ~or possibly also GitLab,~ in this pull request.

In order to make the updates work, we would need to uniquely identify a Topic, Milestone, Label, Release, Comments, Issue and PullRequest. The following list shows how they are identified:

  • [x] Topic - Name (it's just a string, unrelated to any other things)
  • [x] Milestone - OriginalID*
  • [x] Label - OriginalID*
  • [x] Release - OriginalID*
  • [x] Comments - OriginalID*
  • [x] Issue - IssueID and Index
  • [x] PullRequest - IssueID and Index
  • [x] Reviews - OriginalID*
  • [x] Reactions - IssueID, CommentID and Type

OriginalID*: additional column added to database, used to track ID from the source

Screenshot of toggles to set which migration Items should be synced (the only change is now when mirror option is checked, the migration items are still available for supported platforms):

image

harryzcy avatar Jul 10 '22 21:07 harryzcy

You need to implement UpdateIssues, UpdateReactions, UpdateComments and etc. functions. You need to find the unique external id to update them.

lunny avatar Jul 11 '22 02:07 lunny

Codecov Report

Merging #20311 (dd20d7d) into main (f521e88) will decrease coverage by 0.43%. The diff coverage is 20.20%.

:exclamation: Current head dd20d7d differs from pull request most recent head 8fb9956. Consider uploading reports for the commit 8fb9956 to get more accurate results

@@            Coverage Diff             @@
##             main   #20311      +/-   ##
==========================================
- Coverage   47.14%   46.71%   -0.43%     
==========================================
  Files        1149      994     -155     
  Lines      151446   137597   -13849     
==========================================
- Hits        71397    64285    -7112     
+ Misses      71611    65414    -6197     
+ Partials     8438     7898     -540     
Impacted Files Coverage Δ
models/repo/mirror.go 66.07% <ø> (ø)
modules/migration/null_downloader.go 23.40% <0.00%> (-8.03%) :arrow_down:
modules/migration/null_uploader.go 0.00% <0.00%> (ø)
modules/notification/notification.go 93.02% <ø> (+10.01%) :arrow_up:
routers/api/v1/repo/migrate.go 53.43% <ø> (+2.24%) :arrow_up:
routers/web/repo/migrate.go 40.72% <ø> (+1.91%) :arrow_up:
services/migrations/dump.go 39.83% <ø> (+0.55%) :arrow_up:
services/migrations/migrate.go 30.01% <0.00%> (-18.85%) :arrow_down:
models/migrate.go 17.34% <4.26%> (-28.64%) :arrow_down:
models/issues/review.go 48.31% <15.00%> (-4.42%) :arrow_down:
... and 7 more

... and 1146 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Jul 11 '22 06:07 codecov-commenter

@harryzcy this looks really promising! What's the current state of this?

tecosaur avatar Aug 13 '22 05:08 tecosaur

It's on hold since I discovered PR #20122, which is basically doing the same thing. I don't know the status of that PR. So if there's an interest from the community in this PR, I can resume the work. @tecosaur

harryzcy avatar Aug 24 '22 03:08 harryzcy

@earl-warren proposes a new approach https://github.com/go-gitea/gitea/pull/20122#issuecomment-1236438810 and the PR #20122 you mentioned above is now closed. How do you see it @harryzcy in the near future?

nikoPLP avatar Oct 27 '22 13:10 nikoPLP

@earl-warren proposes a new approach #20122 (comment) and the PR #20122 you mentioned above is now closed. How do you see it @harryzcy in the near future?

I will pick it up sometime.

Update: I'll be working on it in Apr. ~Probably I can make it before 1.20 feature freezes.~

harryzcy avatar Dec 30 '22 07:12 harryzcy

any timeline on that feature? Would really appreciate it

lukas-h avatar Apr 28 '23 09:04 lukas-h

This PR is mostly complete and ready for reviews. There maybe some minor issues I'm looking into (updated release doesn't have the right database record, some UI buttons should be disabled, etc.), but that won't change the codebase much.

harryzcy avatar May 16 '23 03:05 harryzcy

I'm a bit afraid to ask because I don't completely understand the changes, but what will happen to references of issues and comments?

As I understand, if between 2 synchronizations a new issue is made both in the local copy of the repository and also at the remote one (e.g. Github), then after the next syncronization the new issue of the remote side will have a different ID locally than there, because they are appended to the list of existing ones. This would mean that if that issue was referenced (by its ID, because thats how it's done) in an other issue, in a PR, in a commit, or somwehere else, then that reference would now point to a different issue than originally intended. This actually also applies to PRs and comments, as they all have serially incremented IDs (and for comments, counting is done globally on the instance level afaik, not on the repository level).

Basically, what happens if there are objects at the remote repository that have the same IDs as some local obejcts? If they get a new ID (because they are appended to existing ones or for an other reason), are references somehow corrected, or will they just become silently broken?

mpeter50 avatar May 25 '23 23:05 mpeter50

@mpeter50 No, two way synchronization is not supported in this PR. It is expected that users doesn't change anything on Gitea that's synced from GitHub, otherwise things will just get broken or get replaced.

Probably some resources should be made read-only, i.e. disable the buttons on the web and return 400 for APIs.

Or even better, we should start storing unique IDs from the remote (GitHub) in this PR so that it will be much easier to enable two-way syncs in the future.

Since this PR will get merged only after 1.20 release, there's more time to discuss about these to avoid premature implementations.

harryzcy avatar May 27 '23 08:05 harryzcy

Or even better, we should start storing unique IDs from the remote (GitHub) in this PR so that it will be much easier to enable two-way syncs in the future.

As a fellow Gitea hoster, I completely agree with this idea. Currently, I also use Gitea as a backup and mirror solution from Github to Gitea. For any important free software project, I rely mostly on Github for now while keeping a full backup/mirror on Gitea. Thus, having a two-way sync feature would be fantastic but still falls behind the priority of having a backup mirror.

Ideally, I would like to migrate entirely from Github while still giving the option for Github users who are not ready yet to opt-in. This approach would require a two-way synchronization feature.

blob42 avatar May 30 '23 12:05 blob42

I have picked up this PR recently. Some of the pending tasks:

  • when comment is updated on GitHub, the updated time in Gitea is not updated (probably have the same issue with other things)
  • release can't be correctly updated

Some non blocking issues:

  • comment history from GitHub not imported

harryzcy avatar Jul 22 '23 00:07 harryzcy

I'm still trying to figure out migration related code. Looks like some code doing duplicated code, and some refactoring to migration code in general (not only this feature) might be needed. Maybe job for another PR after this one.

harryzcy avatar Jul 22 '23 06:07 harryzcy

It's ready to review now?

lunny avatar Aug 22 '23 06:08 lunny

@lunny almost.. almost.. let me go over everything again to see if anything is missing. I'll give an update later this week. I do want to finish it before 1.21 feature freezes.

harryzcy avatar Aug 22 '23 22:08 harryzcy

You don't have to push commits one by one. I think its fine e.g. once at the end of the day, or when you are waiting for review.

Also, sometimes it might be useful to squash multiple commits into a single one. This happens with the rebase function of git, but if you are not familiar with it, you dont have to do it.

mpeter50 avatar Aug 24 '23 19:08 mpeter50

@lunny It's now ready for review

harryzcy avatar Aug 25 '23 05:08 harryzcy

@lunny It's now ready for review

Hi, lint error need to be fixed.

lunny avatar Sep 16 '23 02:09 lunny

So what is the status? Is there any chance that it will be ready for 1.22?

DariuszGarbarz avatar Feb 06 '24 12:02 DariuszGarbarz

So what is the status? Is there any chance that it will be ready for 1.22?

Waiting for reviews

harryzcy avatar Feb 06 '24 21:02 harryzcy

@delvh Would you be able to take a look at this PR again? Thanks

cloning5480 avatar Feb 13 '24 22:02 cloning5480

Does this include mirroring release packages as well? Right now Gitea only mirrors the tarballs, and any artifacts (like .appimages) added to GitHub releases are lost.

rcarmo avatar Mar 08 '24 09:03 rcarmo

Does this include mirroring release packages as well? Right now Gitea only mirrors the tarballs, and any artifacts (like .appimages) added to GitHub releases are lost.

Yes, included in this PR

harryzcy avatar Mar 08 '24 14:03 harryzcy

This PR looks cool, I'll try a build from this branch locally to see if this fixes our needs for having a mirror set up for Github repositories that also syncs, issues, comments on issues and PRs. Thanks @harryzcy Looking forward to this PR getting vetted and merged in the future.

siddarthkay avatar Apr 27 '24 11:04 siddarthkay

@siddarthkay Thank you for helping testing this out.

harryzcy avatar Apr 27 '24 20:04 harryzcy

hmm @harryzcy : I used your branch url as the source of my docker image and now I see that the options to select Issues, PRs while setting up a mirror are disabled.

Is it intended to be that way?

image

Here is my docker-compose.yml Incase it helps to replicate the issue.


version: "3"

networks:
  gitea:
    external: false

services:
  server:
    build:
      context: https://github.com/harryzcy/gitea.git#sync-issue-pr-and-more
      dockerfile: Dockerfile
    container_name: gitea
    environment:
      - USER_UID=1000
      - USER_GID=1000
      - GITEA__database__DB_TYPE=mysql
      - GITEA__database__HOST=db:3306
      - GITEA__database__NAME=gitea
      - GITEA__database__USER=gitea
      - GITEA__database__PASSWD=gitea
    restart: always
    networks:
      - gitea
    volumes:
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3000:3000"
      - "222:22"
    depends_on:
      - db

  db:
    image: mysql:8
    restart: always
    environment:
      - MYSQL_ROOT_PASSWORD=gitea
      - MYSQL_USER=gitea
      - MYSQL_PASSWORD=gitea
      - MYSQL_DATABASE=gitea
    networks:
      - gitea
    volumes:
      - ./mysql:/var/lib/mysql

siddarthkay avatar Apr 28 '24 11:04 siddarthkay

Thanks so much for your work on this PR! I've set it up to attempt to see if it'll work for us.

I imported a private repository using a Personal Access Token. The initial sync works fine but I'm encountering this error on subsequent syncs:

gitea-1        | 2024/05/24 23:02:23 ...eb/routing/logger.go:102:func1() [I] router: completed GET /ORGNAME/REPONAME/settings for ...:0, 200 OK in 15.1ms @ setting/setting.go:95(setting.Settings)
db-1           | 2024-05-24 21:02:24.317 UTC [35] ERROR:  duplicate key value violates unique constraint "issue_index_pkey"
db-1           | 2024-05-24 21:02:24.317 UTC [35] DETAIL:  Key (group_id)=(1452) already exists.
db-1           | 2024-05-24 21:02:24.317 UTC [35] STATEMENT:  INSERT INTO issue_index (group_id, max_index) VALUES ($1, $2)
db-1           | 2024-05-24 21:02:24.317 UTC [35] ERROR:  current transaction is aborted, commands ignored until end of transaction block
db-1           | 2024-05-24 21:02:24.317 UTC [35] STATEMENT:  SELECT max_index FROM issue_index WHERE group_id=$1
gitea-1        | 2024/05/24 23:02:24 ...irror/mirror_pull.go:465:runSyncMisc() [E] runSyncMisc [repo: <Repository 1452:ORGNAME/REPONAME>]: failed to run SyncRepository: pq: current transaction is aborted, commands ignored until end of transaction block

ajvpot avatar May 24 '24 21:05 ajvpot

Hey @harryzcy are you actively working on this PR to sync pull request data from github->gitea, or are you looking for someone to pick up the torch? My company is interested in this feature, and I could: a) develop off this branch and keep your commits, then open a new pr that has your commits + whatever fixes I needed b) advocate we merge some portion of this, like schema changes and client/api/types without implementations and then I develop off of that c) abandon this branch and close this (now draft) PR

This was bumped from 1.23 release goals to 1.24, I'd like to pitch in to make that happen but also don't want to step on your toes if you've got this.

rremer avatar Feb 11 '25 18:02 rremer

Without wanting to sound impatient, given the time that has passed since this PR was last active, and that @harryzcy appears to have been inactive on this project for a significant period, it would be beneficial to clarify the next steps. To help maintain momentum on this feature.

menacing-gingerbread avatar Mar 02 '25 16:03 menacing-gingerbread