registry icon indicating copy to clipboard operation
registry copied to clipboard

feat: add keyword search functionality to registry service

Open tuannvm opened this issue 6 months ago • 2 comments

Motivation and Context

Summary

This pull request enhances the MCP Registry service to support keyword searching across server names and descriptions for both REST API clients and internal database queries. It introduces new methods and modifications across various layers, including API handlers, memory-based storage, MongoDB repository, and service interfaces, to enable efficient case-insensitive search with cursor-based pagination.

Key Details

  • Added search query parameter to /v0/servers API endpoint, supporting keywords for filtering servers.
  • Updated API handler to detect the search parameter and invoke appropriate database methods.
  • Implemented Search method in memory database with case-insensitive substring matching on server name and description.
  • Implemented Search method in MongoDB database using regular expressions for case-insensitive, partial matches.
  • Extended service interfaces and implementations to include Search functionality.
  • Updated test mocks and fake services to simulate search behavior.
  • Cursor-based pagination is maintained, with nextCursor calculated based on the last matched entry.

How Has This Been Tested?

  • Verify that /v0/servers endpoint accepts search parameter and returns filtered results:
    • Example: GET /v0/servers?search=redis
    • Confirm case-insensitive matches.
  • Check combined use with pagination parameters:
    • Example: GET /v0/servers?search=database&limit=10&cursor=...
  • Validate that search results are correctly sorted and paginated.
  • Run existing unit tests and add/verify tests for new search behavior in service and database layers.
  • Performance testing in large environments to ensure search efficiency.

Breaking Changes

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Documentation update

Checklist

  • [x] I have read the MCP Documentation
  • [x] My code follows the repository's style guidelines
  • [ ] New and existing tests pass locally
  • [ ] I have added appropriate error handling
  • [ ] I have added or updated documentation as needed

Additional context

tuannvm avatar Jun 15 '25 22:06 tuannvm

https://github.com/modelcontextprotocol/registry/issues/135

tuannvm avatar Jun 15 '25 22:06 tuannvm

Thanks for working on this, but I think we need to step back and do some back and forth on designing this feature, starting with OpenAPI specs before we try to implement.

First, we should do this https://github.com/modelcontextprotocol/registry/issues/149

Then, we should define the OpenAPI specs for both in a way that is easily extendible. It'd be worth looking at some comparable schemas like xRegistry and/or WoT Discovery for inspiration / alternatives considered.

After that lands, we can do the implementation.

tadasant avatar Jun 25 '25 23:06 tadasant

@tadasant Now that #149 is complete, should we revisit this and what the API should be here? I'm pretty happy with this interface (maybe moving it under an /extensions/ path or something, to indicate it's not part of the very core registry spec) and would lean towards accepting this PR and iterating on it.

FWIW I do think a decent search (beyond substring search) would be useful here. But I appreciate this can end up being costly/adding complexity to what we might want to be very lightweight. It really does depend on how most users end up interacting with the registries etc.

domdomegg avatar Aug 06 '25 12:08 domdomegg

@tadasant Now that #149 is complete, should we revisit this and what the API should be here? I'm pretty happy with this interface (maybe moving it under an /extensions/ path or something, to indicate it's not part of the very core registry spec) and would lean towards accepting this PR and iterating on it.

FWIW I do think a decent search (beyond substring search) would be useful here. But I appreciate this can end up being costly/adding complexity to what we might want to be very lightweight. It really does depend on how most users end up interacting with the registries etc.

My thinking with the generic registry spec is that we'd want a way to define search that can be constrained to just do simple substring search, but also extended to more complex opinionated search as well. The Registry would constrain to simple substring, versus other implementations (usually downstream) would likely allow for more "advanced" search.

Three reasons I am keen to keep Registry search "simple":

  1. Discouraging spam. The more search vectors we add, the more ways to "game" the system. For example, if we enable vector search, everyone will be incentivized keyword-stuff their names and descriptions -- and that's a bad experience for everyone consuming it.
  2. Limiting our complexity. I think server search is an extremely hard thing to do well (I have not seen any third party MCP registry out there do it well to-date, and there have been very many attempts), so I think for the sake of keeping our scope limited and moving quickly, we'd do well to just do the easy thing that keeps us moving.
  3. Discouraging direct consumption of the Registry (versus using it as an upstream). If search is intentionally limited, folks will be less likely to abuse the registry and consume it directly. We don't want them doing that because (1) we want to avoid scale perf problems with lots of recurring hits (millions of hourly users vs. just thousands of mirrors doing daily ETL) (2) we can't support a great SLA anyway with an OSS team (3) encouraging direct usage introduces additional expectations on the registry to make it fit for that use -- I think we are better off limiting our scope as much as possible and only focusing on what the upstream registry can uniquely provide to the community.

Regarding the interface, I haven't thought super deeply on this but my initial reaction to the PR is that search={query}, while suitable for the upstream Registry needs, is not flexible enough for downstream implementations.

For example, WoT Discovery pushes something like this:

# Find all TDs with a specific property name
GET /search/jsonpath?query=$..properties.temperature

JSONPath is probably the other extreme end of the spectrum, where it can "search anything" but is a burden to implement. But maybe there's a case to be made that it's worth it and we bake the complexity into an SDK (and per above, maybe we'd lock the upstream Registry down to just name searches by only allowing GET /search/jsonpath?query=$[?(@.name =~ /.*myquery.*/i)].

My gut is there's some happy medium like a

GET /v0/search?name=query

Where we can allowlist keys in the query parameters (constrained in the upstream Registry but free for all in the generic OpenAPI).

If we just want to push something out for v0 I'd be keen to go with that last suggestion^

tadasant avatar Aug 06 '25 12:08 tadasant

I sincerely apologize for the disruption. This PR was accidentally closed due to a git history rewrite mistake. The PR has been recreated as #244 with all the original content preserved.

Please continue any discussions or reviews on the new PR: #244

Again, I apologize for the inconvenience this has caused.

domdomegg avatar Aug 08 '25 13:08 domdomegg