readthedocs.org icon indicating copy to clipboard operation
readthedocs.org copied to clipboard

Search: allow search several projects at the same time

Open stsewd opened this issue 2 years ago • 6 comments

This implements https://github.com/readthedocs/readthedocs.org/issues/8678#issuecomment-1163780743.

  • Accept a list of projects via the "project" parameter, each project can be in the form of {project_slug} or {project_slug:version_slug}, if the version isn't present, we use the default version of the project.
  • It will include only the results from the versions that were explicitly requested (no subprojects)
  • Every version will be validated if the user has permission over it, if the user doesn't have permission over one or if it doesn't exist, we just don't include results from that version.
  • The old "syntax" is still available (it includes results from subprojects), and it requires using the version parameter.

ref https://github.com/readthedocs/readthedocs.org/issues/8678

Front logo Front conversations

stsewd avatar Jun 23 '22 23:06 stsewd

How does this search interact with our existing site search? It feels like they are solving similar problems, but with pretty different approaches to URL design. Is it possible to unify them with this work? eg. https://readthedocs.org/search/?q=awesome&type=file&project=&version= searches across all versions & projects, but only allows me to filter by a single project. It would be neat to have multiple project filtering there.

Hmm, so the site search is searching all projects, do we want to expose that in our API? I think it makes sense on .org, not sure about .com? On .com we search all projects the user has access to. Maybe keep that same behavior?

Also, that searches across all projects and versions, but we control access at the version level, rather than project level, so not sure if we can map that directly to the search API.

Did we consider using the query syntax for filtering instead of different API parameters? Is it possible to still implement this in the query for the user, but modify the values going into the API? Is that what we should be doing? Something like awesome project:dev project:docs include_subprojects:True.

Are we considering implementing this syntax in the query? It seems like we are mixing how we expect users to use our search vs. developers who want to use our search.

It's possible to do it that way, we will need to parse the query and maybe check for some ambiguities. My main reason to use query parameters was so devs could define what projects to search in one endpoint, but I guess the same can be achieved if the parameters are in the query itself (q="{prepared-parameters}+{actual user query}")

This feels like a very large change to the format of this API. I'd argue we should be creating a new API endpoint, or a new version of this API to accomodate this. That would make the code much cleaner, and allow us to keep the old one functioning without having to support it in the same code line as our new, fancy implementation.

No really strong opinion whether to make a new endpoint or not, the only difference will be the input, the output will be the same (or almost, just new fields will be added). Or if we want to play with a new format for the response, that's fine too I guess.

stsewd avatar Jun 30 '22 17:06 stsewd

@stsewd Sorry I haven't gotten back to discuss this. I think this is useful work, so let's go ahead and focus on shipping it and we can think about the dashboard implications in the next round of work on this.

Hmm, so the site search is searching all projects, do we want to expose that in our API? I think it makes sense on .org, not sure about .com? On .com we search all projects the user has access to. Maybe keep that same behavior?

I'd like to have this be the default behavior of the dashboard search on both sites. I think it's way more useful for users, and makes both sites act the same.

It's possible to do it that way, we will need to parse the query and maybe check for some ambiguities. My main reason to use query parameters was so devs could define what projects to search in one endpoint, but I guess the same can be achieved if the parameters are in the query itself (q="{prepared-parameters}+{actual user query}")

Yea, I think having this functionality be exposed to users is pretty valuable. Both should work though, and we can combine them on our side?

No really strong opinion whether to make a new endpoint or not, the only difference will be the input, the output will be the same (or almost, just new fields will be added). Or if we want to play with a new format for the response, that's fine too I guess.

If we're just adding new fields, we can probably keep it the same. It does feel like we're "versioning" it though just through the passing of a specific field. This is somewhat confusing, and we could think of a clearer way to explain this to users.

ericholscher avatar Jul 07 '22 21:07 ericholscher

Both should work though, and we can combine them on our side?

I'd vote to have just one way to do it, I'm +1 on using the key:value syntax. Should we keep the same syntax for specifying a version? project:docs:latest, or use another separator for the version? project:docs/latest?

stsewd avatar Jul 12 '22 01:07 stsewd

It does feel like we're "versioning" it though just through the passing of a specific field. This is somewhat confusing, and we could think of a clearer way to explain this to users.

yeah, maybe versioning could work, but If create a new version, I think we could think in expanding a little the response, for example have project be an object instead of a string with just the slug, that way we could combine these into a proper object (project.alias or version.project)

          "project": "docs",
          "project_alias": null,

we won't include the whole thing like https://docs.readthedocs.io/en/stable/api/v3.html#get--api-v3-projects-(string-project_slug)-, but just include the slug and leave the door open in case we need to another attributes.

Don't know, just a thought, maybe we don't need to expand those fields.

stsewd avatar Jul 12 '22 02:07 stsewd

@stsewd

I'd vote to have just one way to do it, I'm +1 on using the key:value syntax

IMHO, it's easier to see, write and understand using something like project=docs instead of project:docs. Besides, it makes it easier to visualize it when defining a particular version too, project=docs:latest. This matches what we talked about when defining the GET attributes.

humitos avatar Jul 13 '22 14:07 humitos

Meh, take that back 🙃 --GitHub and New Relic both use key:value; so it seems more standardized to follow that pattern as well. However, how do they handle complex attributes like project:slug:version? Probably, worth checking that to follow the same approach.

humitos avatar Jul 13 '22 14:07 humitos

This was implemented in the search API v3

stsewd avatar Jan 03 '23 18:01 stsewd