readthedocs.org
readthedocs.org copied to clipboard
Search: allow excluding sections inside a page
What's the problem this feature will solve?
Users can exclude an entire page from our search, and we use some heuristics to remove content that's shouldn't be indexed (like footers, navbars, etc). See https://dev.readthedocs.io/en/stable/search-integration.html#irrelevant-content.
But, it may be useful for users to tell us explicitly what they don't want to include in search results from a page, like a long example, a form, etc.
Describe the solution you'd like
We could offer users two ways of excluding content.
- Have an explicit class that users can put in their HTML tags, like a
rtd-exclude-from-search
class. - Have users tell us about the section they don't want to include by using an identifier (
page.html#section-id
). This would be in the same section of the config file we already use (https://docs.readthedocs.io/en/stable/config-file/v2.html#search-ignore). Some other examples:api/v1.html#deprecated
,*.html#warning
, etc.
Additional context
Excluding content using rules should be easy to add, since exclusion is done at indexing time (we just skip over the content!). Adding boosting per sections would be more hard, since we do the boosting at search time.