CodeTriage icon indicating copy to clipboard operation
CodeTriage copied to clipboard

Remove archived repos

Open hbontempo-br opened this issue 3 years ago • 8 comments
trafficstars

First of all: Great service!

Last week I received I "CodeTriage misses you" email (yeah, I know... 😞) that suggested me to engage with an archived repo.

I don't think that any archived repository should be suggested to a user (both on email suggestions and through the web application)

Expected Behavior

Archived repos probably should be removed from the platform. They shouldn't be suggested on daily emails or through the web interface "Open source projects on GitHub that need your help." functionatlity

Current Behavior

Archived repos are handled the same way as any other repo.

Possible Solution

Periodically check if the repo was archived then:

  • Record it:
    • Save to a flag column in the database
    • Add a default_scope to the model

OR

  • Delete it

(Probably could expand the the existing Checks if repos have been deleted on GitHub task if necessary to unnecessary requests to GH's api)

Steps to Reproduce (for bugs)

Example: https://www.codetriage.com/simplabs/excellent

hbontempo-br avatar Oct 08 '22 03:10 hbontempo-br

100% we should remove archived repos and send an email to subscribers.

And with deleted repos. And we should support moved/transferred repos as well. Depending on complexity this could be several PRs.

Thanks for the issue!

Richard Schneeman https://www.schneems.com he/him

schneems avatar Oct 08 '22 12:10 schneems

Didn't though about about this other scenarios.

I didn't found anything explicitly on GH's documentation on what to expect when a repo is rename or transferred, so I did some testing with my accounts. ~~Renaming and transferring the repo private shouldn't be a big problem to track, despite all the changes the repo can still be accessed using the repo endpoint with the old address (unless the a new repo is created with the name of old repo). Making the repo private results in the same behaviour as a delete repo (response 404).~~


Update 2022-10-10

After @avcwisesa's comment I went back to check my testing and I was wrong. In my laziness I used the gh api commands thinking they would return the same thing a curl, but they don't. Here is the correct behavior for the request GET https://api.github.com/repos/USER_NAME/REPO_NAME:

  • Public repo:

    • Response status: 200
    • Response body: as described by the documentation
  • Archived repo:

    • Response status: 200
    • Response body: the same as the previous case but with 'archived' key set to true
  • Renamed OR Transferred:

    • Response 301
    • Body:
    {
      "message": "Moved Permanently",
      "url": "https://api.github.com/repositories/1111111",
      "documentation_url": "https://docs.github.com/v3/#http-redirects"
    }
    
    • When following the address I received the same body as described in the GET https://api.github.com/repos/USER_NAME/REPO_NAME request
  • Deleted OR Private repo:

    • Response status: 404
    • Response body:
    {
     "message": "Not Found",
     "documentation_url": "https://docs.github.com/rest/reference/repos#get-a-repository"
    }
    

In summary:

  • tracking with deleted/private is already working
  • tracking archived repos should be easy as well
  • tracking renamed/transferred repos apparently is easy, just start saving the id of the repo and used to in the api instead of the USER_NAME/REPO_NAME, but I'm not sure if this is a good idea since it's not in the GH documentation

hbontempo-br avatar Oct 10 '22 03:10 hbontempo-br

I also encountered a slight doubt when using CodeTriage the first time related to moved/transferred repo not sure which one is the correct one.

image

I saw when the old address is accessed, Github will return 301 with the new location.

curl -v https://github.com/pydata/pandas
< HTTP/2 301
< server: GitHub.com
< date: Mon, 10 Oct 2022 06:28:41 GMT
< content-type: text/html; charset=utf-8
< vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
< location: https://github.com/pandas-dev/pandas
< cache-control: no-cache

avcwisesa avatar Oct 10 '22 06:10 avcwisesa

I think that what's happening is the client https://github.com/zombocom/git_hub_bub/blob/master/README.md is automatically following the 301 redirect.

It uses excon under the hood and here's where I'm doing that location following https://github.com/zombocom/git_hub_bub/blob/5003d6ef8d6a23fae3c0a534c49d10057265eb1b/lib/git_hub_bub/request.rb#L46.

Essentially in this case it's hitting the old API endpoint, which is giving me a 301 and I'm following that 301 to return actual valid issues. So in this case subscribing to either is "correct" in that you're getting the same issues. However it's confusing that there's two of them.

What I think we need to detect that case is to make a request to the API for each repo (without following the 301 redirect) to determine if it's been moved or not. If it has been moved, loop through all the users and move them over to the new repo ( create a new repo if not already done. Then there's also updating dependent to point at the new repo as well:

https://github.com/codetriage/CodeTriage/blob/3a16a410307c9334f83c3745a24b314e7c5ed859/app/models/repo.rb#L12-L20.

schneems avatar Oct 11 '22 14:10 schneems

I'm wondering if there are any Rails gems or methods to handle that for us instead of having to do it all manually. (i.e. to update all foreign keys that to point at a new database value).

schneems avatar Oct 11 '22 14:10 schneems

What I think we need to detect that case is to make a request to the API for each repo (without following the 301 redirect) to determine if it's been moved or not.

+1, maybe can put this check on new repo submission & as periodic job?

If it has been moved, loop through all the users and move them over to the new repo ( create a new repo if not already done. Then there's also updating dependent to point at the new repo as well:

Some stuff I want to clarify about that thought:

  • When a registered repo in CodeTriage got transferred/renamed. Do we want to create a new repo or update the existing instance with the new name/user_name?
  • Does subscribed users need to be notified? (in case of a rename)

If we're only updating the name & user_name, then not need to touch the repos with updated name & user and maybe can have rake task to deduplicate the repo 🤔

avcwisesa avatar Oct 12 '22 02:10 avcwisesa

What I think we need to detect that case is to make a request to the API for each repo (without following the 301 redirect) to determine if it's been moved or not.

Since git_hub_bub deals with the redirects why not use this feature and just compare the information returned by the RepoFetcher with the one we have in the Repo?


Suggestion: Let's break all this discussion in smaller chuncks? (maybe even break this issue into multiple)

In my mind we could split the problem in:

  1. Keeping track of changes (Pretty confidant that I can do a PR until the weekend so we can discuss over)
  2. Dealing with duplicated repos
  3. Handling notification to subscribers

hbontempo-br avatar Oct 13 '22 04:10 hbontempo-br

In my mind we could split the problem in:

100% sounds good. I'm totally fine with breaking into multiple pieces.

Since git_hub_bub deals with the redirects why not use this feature and just compare the information returned by the RepoFetcher with the one we have in the Repo?

That's perfectly valid as well. GitHub returns URLs to resources, so you could compare the expected versus actual URL without needing an extra API request.

When a registered repo in CodeTriage got transferred/renamed. Do we want to create a new repo or update the existing instance with the new name/user_name?

I think renaming it is probably the easiest thing if we can, then we don't have to deal with merging/porting over data. It becomes a problem if someone else ALREADY added the other repo before we can move it ourself.

Does subscribed users need to be notified? (in case of a rename)

I would say no, if we need to change this policy later we can.

schneems avatar Oct 18 '22 18:10 schneems