Community-driven mechanism for takedown of spam/malicious servers
Some discussion here: https://github.com/modelcontextprotocol/registry/issues/93
When someone publishes a malicious or spam server.json, we need a mechanism for getting it reported and taken down.
While we can rely on existing source registries that we reference (e.g. npm, pypi, etc) to pull down malicious source code, we can't rely on the same mechanism for remote servers.
Steps to do here:
- [ ] Evaluate how other registries in the ecosystem deal with this. Likely solution is to enable community reporting of spam/malicious intent.
- [ ] Design mechanism for making those submissions
- [ ] Set thresholds for what meets the bar for a takedown
- [ ] Implement
Usually this is handled by having the humans that operate the registry monitoring some sort of inbox / report queue for people reporting a submission as malicious. If the registry has a web interface, you can implement sending a malicious report from the page associated with the submission.
See things like:
- https://docs.npmjs.com/reporting-malware-in-an-npm-package
- https://blog.pypi.org/posts/2024-03-06-malware-reporting-evolved/
We weren't planning on having a web UI at initial launch, but I wonder if we could facilitate something similar with a PR process + label on modelcontextprotocol/registry.
As an immediate solution, we could start by verifying the existence and reputation of the domains of the submitted remote mcp servers, checking the domain presence in search engines, and validating the identities of those submitting MCP servers to the registry. Once an MCP server is added, these details will also be available in the registry API responses as well. While this approach isn't foolproof and could potentially be manipulated (with a considerable effort), I believe this provides a reasonable starting point for going live.
Let's focus this issue specifically on takedown mechanisms after proactive measures have failed us^. What you're describing sounds like a suggestion for how to further enrich our reverse-DNS namespacing mechanism, which is a separate topic.
One concern here is brigading. Should a server with 100k downloads be taken down due to 100 reports? Would there be an appeal process?
Perhaps we could track report counts alongside download counts, and leave the decision and appeal process to aggregators.
We could also consider aggregating stats over sliding windows, e.g. "reports in the last week", "reports in the last month", "reports in the last 6 months".
Good callout on brigading. Maybe we allow for reporting a whole server, and reporting a specific version; with the option to permanently whitelist something (i.e. prevent future reporting on it) by appeal.
So if someone gets brigaded, they can appeal, and then be permanently shielded from brigading thereafter after manual review.
Distinguishing between version and server whitelisting could be important: servers are more likely to be reported as spam; versions are likely to be reported as malicious (e.g. supply chain attack). Permanent whitelisting of a server doesn't mean all its versions should be un-reportable.
As per https://github.com/modelcontextprotocol/registry/issues/150#issuecomment-3212890574, for go-live we will do manual moderation and see how it goes. Going to track:
- manual moderation policies and SLAs in #150, which is a go-live blocker
- community driven/fancy abuse reporting flow in this (#92), which is not a go-live blocker