warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Publish a list of malicious packages that have been taken down

Open di opened this issue 7 years ago • 27 comments

What's the problem this feature will solve? Users who may have possibly installed malicious packages don't have insight into what packages have been taken down by PyPI administrators.

Describe the solution you'd like PyPI should publish both a human-readable and machine-readable (API) list of malicious packages that have been taken down. Ideally the human-readable list would be sortable by package name, or by the date it was created/taken down.

Additional context Feature request to automatically uninstall packages via this API in pip: https://github.com/pypa/pip/issues/5777

di avatar Sep 12 '18 15:09 di

@di How do I find packages that have been taken down - from the database point of view? Is there any flag (is_removed or is_malicious) in the table?

waseem18 avatar Sep 23 '18 14:09 waseem18

@waseem18 Nope, there isn't, so we would have to add that flag and manually infer it from the comments of previously removed packages.

di avatar Sep 23 '18 14:09 di

Okay, So we add the flag to the respective table and we set it to true for any projects we want to mark as malicious.

If I understand you correctly, as the data of already removed packages doesn't exist on our database, we would need to infer it from the Warehouse GH issues.

So after we add the flag, the API call would return any packages that are flagged as malicious + the list we have of already removed packages.

Please do correct me if I'm wrong.

waseem18 avatar Sep 23 '18 15:09 waseem18

There is a comment field on the BlacklistedProject model: https://github.com/pypa/warehouse/blob/97f28dfa5a4017dd1e1a7630f772ce01ec1af749/warehouse/packaging/models.py#L568

What I meant was that once we add the ability to mark a BlacklistedProject as malicious, there should be some way the administrators can go back and manually set this marker based on the comments we left. There's only about 200 projects right now, so this wouldn't be a terrible burden.

di avatar Sep 23 '18 16:09 di

Gotcha!

So we can add the is_malicious flag to the BlacklistedProject model with default value as false.

And the API end point would query for the entities of BlacklistedProject table with the flag set to true.

waseem18 avatar Sep 23 '18 17:09 waseem18

Please be sure to provide the reason for each takedown case - e.g. DMCA request, government/security services involvement, somebody's whim, etc.

pfalcon avatar Oct 13 '18 09:10 pfalcon

This issue is only about malicious packages, which are taken down by the PyPI admistrators at their discretion.

di avatar Oct 13 '18 13:10 di

Is anyone working on this? I would like to work toward this during the Bloomberg Sprint.

If nothing else, figuring out how this works/is exposed from Warehouse's side should be a good start.

pradyunsg avatar Oct 16 '18 06:10 pradyunsg

I'm not working on this @pradyunsg . Feel free to pick it up.

waseem18 avatar Oct 16 '18 06:10 waseem18

Hey I will pick it up at the Bloomberg NYC Sprint!

oliviersm199 avatar Oct 27 '18 16:10 oliviersm199

#4962 relates to this issue

oliviersm199 avatar Oct 27 '18 20:10 oliviersm199

Blocked on #5117.

brainwane avatar Sep 02 '19 16:09 brainwane

Blocked on #5117.

Not necessarily, we manually remove malicious packages sometimes and the ability to automatically detect malicious packages shouldn't prevent us from publishing which packages we've manually taken down.

di avatar Sep 09 '19 19:09 di

#4962 mostly implements the first step towards this, but wasn't finished.

di avatar Sep 09 '19 19:09 di

Per https://github.com/pypa/warehouse/issues/7840, this list should include all "blocked" packages along with the reason for blocking, if applicable.

di avatar May 01 '20 18:05 di

we should publish all "blocked" packages along with the reason for blocking, if applicable.

To be clear, you mean providing a publicly accessible list/table of all blocked packages and why they were blocked; and not changing/putting up new releases on that name. Correct?

pradyunsg avatar May 14 '20 19:05 pradyunsg

Thanks, my comment was unclear, updated to 'this list should include all "blocked" packages', I'm not suggesting we actually publish (create releases for) these packages.

di avatar May 14 '20 19:05 di

Is there a need for the flag in the database to distinguish between "blocked" reasons? As long as we're preserving PyPI admin discretion (which I agree with), it seems like that additional sort of information doesn't need to be exposed at this level.

And in my understanding, that would simplify this down to an API and possibly a formatted page (though I'm not totally convinced) that would return the list of blocked names. So all of the previous PR is not needed.

Though given #7840, perhaps we can also return a different status code for blocked names on install (rather than 404)? That would allow installers to handle an exceptional case directly, rather than having to maintain a list from our new API.

Guessing this just needs someone to work on it?

zooba avatar Sep 01 '20 20:09 zooba

For anyone interested, my PR in #8533 works but is probably stalled on having a good path for the API. All the existing JSON APIs are under /pypi/<project-name>, which doesn't leave an obvious place to add this one (short of claiming the project name matching the API). Ernest already rejected putting it under /admin because that path is exclude from CDN.

Happy to receive any suggestions either here or there. I don't have near enough insight into PyPI's routing design to make a confident decision myself.

zooba avatar Sep 16 '20 16:09 zooba

Probably blocked on #284.

di avatar Sep 16 '20 16:09 di

If we're going to wait for a complete API redesign and potential technology change, can we just manually dump the list of banned names into a public text file somewhere until that's ready?

zooba avatar Sep 16 '20 17:09 zooba

As far as I understand, PyPI still does not provide any reasonable way to check whether a package has been taken down (I hit a package name that is not listed on PyPI but prohibited). From the viewpoint of a package developer, this is not a good situation; the only way to check whether a package name was already taken by package name squatters and then taken down by admins is to try to squat the package name.

tueda avatar May 23 '21 07:05 tueda

@di what do you think about getting these feeding into PyPA Advisory Database? Then it would feed into OSV and anything else consuming those data sources. Of course it'd also be great to get pip-audit able to detect these. I realise that it would involve a few changes since I think currently everything depends on the pypi package JSON existing and it won't for these removed packages, but I think it'd be worth trying. I'm happy to have a go at getting a couple of initial advisory entries created and see what happens from there

westonsteimel avatar Dec 03 '21 07:12 westonsteimel

@westonsteimel I had the same thought. We could either do that, which is a bit circuitous but will work, or if we decided OSV or the Advisory Database is not the right place for these types of things, we could just take the easier route of including these in the vulnerabilities field based on Warehouse's internal state about these prohibited names.

Let's raise an issue at https://github.com/pypa/advisory-db to decide whether malware/spam/etc should generate advisories.

di avatar Dec 03 '21 14:12 di

Noting here that there's now an OSV database hosted by the OpenSSF that tracks this information: https://github.com/ossf/malicious-packages

sethmlarson avatar Nov 06 '23 16:11 sethmlarson

Also, note that that database only includes a fraction of the packages being taken down, there is currently no PyPI -> OSV link that populates that database.

di avatar Nov 06 '23 19:11 di

See here for an update on this topic: https://discuss.python.org/t/pypi-malware-observation-report-outcomes-private-preview/49060

miketheman avatar Mar 20 '24 19:03 miketheman