sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

search: support searching for (sub)titles

Open marxin opened this issue 3 years ago • 5 comments
trafficstars

Collect all titles from all pages and utilize a full match (case insensitive) in Search page.

Fixes: #10689

marxin avatar Jul 28 '22 09:07 marxin

Does this work for partial matches?

A

AA-Turner avatar Jul 31 '22 18:07 AA-Turner

Does this work for partial matches?

Yes, I've just added partial match support.

marxin avatar Aug 01 '22 06:08 marxin

Please add tests and a CHANGES entry.

A

AA-Turner avatar Aug 08 '22 20:08 AA-Turner

I've just added CHANGES, but for a working test, I would need an HTML inspection of rendered search.html page. Do we have such tests?

marxin avatar Aug 09 '22 12:08 marxin

As an improvement, it should be possible to emit a link directly to the title in a page. But I can't find a function that would give me for a nodes.title Node an anchor:

{'rawsource': 'Demo documentation', 'children': [<#text: 'Demo documentation'>], 'attributes': {'ids': [], 'classes': [], 'names': [], 'dupnames': [], 'backrefs': []}, 'tagname': 'title', 'parent': <section "demo documentation": <title...><compound...><paragraph...><substitution_defin ...>, '_document': <document: <substitution_definition "gol"...><section "demo documen ...>, 'source': '/home/marxin/Programming/texi2rst-generated/sphinx/demo/index.rst', 'line': 2}

Can you please help me?

marxin avatar Aug 10 '22 12:08 marxin

I think you may need to look at the parent section element to find the ids.

jbms avatar Sep 01 '22 20:09 jbms

Note: In https://github.com/jbms/sphinx-immaterial I have implemented something similar entirely client side, but there are a few differences:

  • The sub-sections are parsed from the HTML document itself, while extracting snippets.
  • If the search text is found within the page, the result link is to the nearest containing section.

This is the source code of my implementation, for reference: https://github.com/jbms/sphinx-immaterial/blob/main/src/assets/javascripts/sphinx_search.ts

In general handling this when building the index is probably better, though given that the HTML must be parsed anyway to handle the snippets I'm not sure.

jbms avatar Sep 01 '22 20:09 jbms

In general handling this when building the index is probably better, though given that the HTML must be parsed anyway to handle the snippets I'm not sure.

Yes, I do prefer the server side implementation and I'm still curious about the title links as mentioned in my previous comment. One should be able to get a link to them.

marxin avatar Sep 07 '22 12:09 marxin

One should be able to get a link to them.

Something like node.parent["names"] should work? The anchor link is on the docutils.nodes.section node as I recall.

A

AA-Turner avatar Sep 07 '22 13:09 AA-Turner

Something like node.parent["names"] should work? The anchor link is on the docutils.nodes.section node as I recall.

Yep, that almost works:

diff --git a/sphinx/search/__init__.py b/sphinx/search/__init__.py
index bbb28c0b9..7916d26f0 100644
--- a/sphinx/search/__init__.py
+++ b/sphinx/search/__init__.py
@@ -216,6 +216,8 @@ class WordCollector(nodes.NodeVisitor):
         elif isinstance(node, nodes.title):
             title = node.astext()
             self.found_titles.append(title)
+            print('node:', node)
+            print('node.parent[names]:', node.parent['names'])
             self.found_title_words.extend(self.lang.split(title))
         elif isinstance(node, Element) and self.is_meta_keywords(node):
             keywords = node['content']

emits something like:

node: <title>Comparison of GCC docs in Texinfo and Sphinx</title>
node.parent[names]: ['comparison of gcc docs in texinfo and sphinx']
node: <title>HTML output</title>
node.parent[names]: ['html output']
node: <title>Formatting</title>
node.parent[names]: ['formatting']

So the last missing piece is probably an escaping that will emit e.g. comparison-of-gcc-docs-in-texinfo-and-sphinx?

marxin avatar Sep 07 '22 14:09 marxin

Ahh, can you try ["ids"]?

A

AA-Turner avatar Sep 07 '22 14:09 AA-Turner

Ahh, can you try ["ids"]?

Works for me, added that.

marxin avatar Sep 07 '22 18:09 marxin

Can you please @AA-Turner review the pull request now?

marxin avatar Sep 08 '22 10:09 marxin

Something seems to be wrong:

https://sphinx--10717.org.readthedocs.build/en/10717/search.html?q=More+topics+to+be+covered

https://www.sphinx-doc.org/en/master/search.html?q=More+topics+to+be+covered

The PR only shows 5 results, and doesn't highlight the title, whereas the current master shows the title, albeit as the third result.

A

AA-Turner avatar Sep 08 '22 13:09 AA-Turner

The PR only shows 5 results, and doesn't highlight the title, whereas the current master shows the title, albeit as the third result.

Yeah, it's a fancy feature of Read the Docs, it must be a plug-in that is used for Sphinx docs.

Please compare it with another pull request: https://sphinx--10807.org.readthedocs.build/en/10807/search.html?q=More+topics+to+be+covered

marxin avatar Sep 08 '22 13:09 marxin

Rebased.

A

AA-Turner avatar Sep 09 '22 00:09 AA-Turner

If running an incremental build, searchindex.js is only updated after loading it, so we need to bump the environment version.

A

AA-Turner avatar Sep 09 '22 01:09 AA-Turner

Thanks @marxin!

A

AA-Turner avatar Sep 09 '22 01:09 AA-Turner

Thanks for merging that!

marxin avatar Sep 09 '22 05:09 marxin