packagecontrol.io icon indicating copy to clipboard operation
packagecontrol.io copied to clipboard

Use the repository ID to verify whether a GitHub username was squatted

Open FichteFoll opened this issue 4 years ago • 3 comments

Instead of marking packages as "needing review" if they were unavailable once, it would be more reasonable and robust to instead check the repository id as returned by GitHub's API, store that in the database, and flag those packages as needing review that have a different ID from the latest crawl compared to the database.

FichteFoll avatar May 25 '21 10:05 FichteFoll

Beyond this, the system will need to track and see if a third-party domain changed hands. Also, it will need to do the same sort of thing for GitLab and BitBucket.

I'm not sure if there is an automated way to see if the domain has changed hands. Maybe whois can provide the first registration date and that can be used?

wbond avatar May 27 '21 14:05 wbond

Another thing to consider is a repo URL being changed deliberately. I do hope that this can be checked on the database, so that an ID is only checked for the same URL. Outside of custom-hosted packages, for which we'll probably still need the "was missing" check, the implementations for the three git hosters should be very similar.

FichteFoll avatar May 28 '21 00:05 FichteFoll

I don't think you should reach for a 100% solution here. Reducing false positives is an incremental process. Only handling GitHub reduces the stress as it's probably the 90% hoster nowadays. (And github.com can't change the owner without you reading it in the news.)

Say we just grab a uid from GitHub. In package.modify.store(values), now values will have maybe this uid (iff the provider provides it). Within store() we already cursor.fetchone() to decide if we INSERT or UPDATE. On UPDATE we can now compare the old uid with the new uid and reject some changes. Or allow these changes, but immediately mark needs_review.

kaste avatar Jun 12 '21 20:06 kaste