readthedocs.org icon indicating copy to clipboard operation
readthedocs.org copied to clipboard

Search: allow excluding sections inside a page

Open stsewd opened this issue 2 months ago • 0 comments

What's the problem this feature will solve?

Users can exclude an entire page from our search, and we use some heuristics to remove content that's shouldn't be indexed (like footers, navbars, etc). See https://dev.readthedocs.io/en/stable/search-integration.html#irrelevant-content.

But, it may be useful for users to tell us explicitly what they don't want to include in search results from a page, like a long example, a form, etc.

Describe the solution you'd like

We could offer users two ways of excluding content.

  • Have an explicit class that users can put in their HTML tags, like a rtd-exclude-from-search class.
  • Have users tell us about the section they don't want to include by using an identifier (page.html#section-id). This would be in the same section of the config file we already use (https://docs.readthedocs.io/en/stable/config-file/v2.html#search-ignore). Some other examples: api/v1.html#deprecated, *.html#warning, etc.

Additional context

Excluding content using rules should be easy to add, since exclusion is done at indexing time (we just skip over the content!). Adding boosting per sections would be more hard, since we do the boosting at search time.

stsewd avatar May 08 '24 18:05 stsewd