Change repo uniqueness to be based on repo_src_id not url
When repositories move like https://github.com/openai/triton to https://github.com/triton-lang/triton, both can be added and cause neither to fully complete collection. If the check before adding to the repository was based on repo_src_id I believe this problem could be prevented
This is an issue on instances that started before May, 2024. This is addressed/fixable, using the scripts here: https://github.com/chaoss/augur-utilities/tree/main/more_cowbell
@ABrain7710 : I think this is possibly not fixed by our Augur patch to the 100% level. @cdolfi is reporting that a new repository that was a duplicate got added in December. I really thought this was patched, and I know it was tested. So, perhaps there is an edge case missed?
https://github.com/openai/triton to https://github.com/triton-lang/triton,
I verified this occurred on Padres in January of this year:
select * from repo where repo_git like '%triton';
@sgoggins I haven't heard about this in awhile. Where are we at on this?
repo source id applied to frontend repo additions in https://github.com/chaoss/augur/pull/2929
This issue should be resolved as a result
Still an issue with CLI additions
per @sgoggins analysis:
augur/tasks/frontend.py has the method called first. (add_new_github_repos) augur/application/cli/db.py seems to handle the insertion of repos at the command line (add_repos)
A brief examination of the code, [...] suggests to me that you possibly added repositories using the command line tools? I say that because it "looks like" we actually already have the GH ID in the table, and we are checking for duplicates in that interface.
We should compare the add_new_github_repos and add_repos functions to see how they differ and refactor them to use the same methods to add repositories