registry icon indicating copy to clipboard operation
registry copied to clipboard

Automatically remove server.json entries referencing invalid packages

Open tadasant opened this issue 7 months ago • 4 comments

Some discussion here: https://github.com/modelcontextprotocol/registry/discussions/50

In the case where npm, pypi, etc. take down a package (e.g. in the case it is found to be malicious), we don't want to maintain a reference to that package.

Because server.json is meant to be immutable, what we should probably do in this case is delete the version of the server.json altogether.

We'll want to run this kind of check (for broken package references) at least ~daily.

tadasant avatar May 27 '25 23:05 tadasant

Other related problem this solves: avoid user disappointment due to them finding an interesting MCP, trying to install it, and then finding the underlying package has been deleted so it doesn't work.

domdomegg avatar Aug 21 '25 02:08 domdomegg

Related - could probably test this implementation on (and clean up) our seed data: https://github.com/modelcontextprotocol/registry/issues/49

domdomegg avatar Aug 21 '25 03:08 domdomegg

Analysis generated by Claude Code to help evaluate the implementation approach and priority.

Analysis of Automated Package Validation Requirements

I've done a technical analysis of implementing automated package validation for detecting and removing server entries with invalid package references. Here are my findings:

Technical Feasibility ✅

The implementation is definitely feasible. I've identified several viable approaches:

  1. Continuous background validation - A service that iterates through packages based on last_validated_at timestamps
  2. Scheduled batch validation - Daily/periodic validation of all packages
  3. Event-driven validation - Using package registry webhooks where available
  4. External job-based validation - Separate CLI tools run via cron/K8s jobs

All approaches would involve HTTP requests to package registries (npm, PyPI, etc.) to verify package existence and remove invalid server entries from the database.

Priority Assessment 🤔

However, I want to raise some questions about necessity and timing:

Low actual impact scenarios:

  • Package registries rarely delete packages permanently (mostly just malicious ones)
  • Current registry scale appears to be dozens-hundreds of packages, not thousands
  • Users would likely report broken packages quickly in a small ecosystem
  • Manual cleanup might be sufficient initially

Implementation overhead:

  • Significant development time for potentially rare edge cases
  • Adds operational complexity (HTTP clients, error handling, monitoring)
  • New failure modes from external service dependencies

Alternative Recommendation 💡

Consider a phased approach:

Phase 1 (Minimal viable solution):

  • Direct users to report broken packages via GitHub issues
  • Simple admin tools for manual removal of reported servers
  • Monitor actual frequency of package deletion issues

Phase 2 (If needed based on real data):

  • Implement automated validation if broken packages become a real problem
  • Start with simple daily batch validation
  • Focus on most critical registry types first

Questions for the Team 🤷

  1. Timeline: Is this truly blocking go-live, or could it be deferred?
  2. Scale: What's the expected package count in 6-12 months?
  3. Evidence: Any user reports of this being a pain point in testing?
  4. Priority: Are there other go-live blockers where effort might be better spent?

I'm happy to implement whichever approach the team decides on, but wanted to surface these considerations for discussion.

tldr: If we want to implement it, I think an ongoing background job is probably the way. But it does add a lot of development complexity especially with the number of registries and handling things like rate limiting, retries etc. for deletions which should be fairly rare (and clients won't actually be able to download malicious packages after they have been removed from source registries). So maybe for now we should consider this not a go-live blocker?

domdomegg avatar Sep 01 '25 13:09 domdomegg

(and clients won't actually be able to download malicious packages after they have been removed from source registries)

I would caveat this with: if someone published a malicious package to both npm and an oci build of it to dockerhub and also an MCPB bundle, and npm's detection mechanisms took it down, we'd do well to just follow npm's lead and take it down across all its package formats.

That said, agree this is something of an edge case and think we can remove it from go live blockers.

tadasant avatar Sep 01 '25 15:09 tadasant