hextra icon indicating copy to clipboard operation
hextra copied to clipboard

Search returns fewer than expected results

Open bloovis opened this issue 6 months ago • 11 comments

I have implemented search for my site as documented in the Hextra configuration guide, but I get fewer results than expected for searches.

My blog has 202 posts, and 20 of them contain the word "Firefox". But when I search for "firefox", only two of the posts are displayed. I cannot discern a pattern to this unexpected behavior.

Another strange thing about search is that I can find one of the missing posts by searching for "firefox scrollbar", because that phrase appears in the post. But "firefox" alone will not return that post.

Here are the relevant lines from hugo.yaml:

params:
  search:
    enable: true
    type: flexsearch
    flexsearch:
      index: content

bloovis avatar Jun 30 '25 02:06 bloovis

After some console.log debugging, I see that the problem is related to the hardcoded 5 in these two places in flexsearch.js:

    const pageResults = window.pageIndex.search(query, 5, { enrich: true, suggest: true })[0]?.result || [];
...
      // Show the top 5 results for each page
      const sectionResults = window.sectionIndex.search(query, 5, { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];

The other problem is that sectionResults is zero for most pages, so those pages never get put into the results.

bloovis avatar Jun 30 '25 14:06 bloovis

After some console.log debugging, I see that the problem is related to the hardcoded 5 in these two places in flexsearch.js:

    const pageResults = window.pageIndex.search(query, 5, { enrich: true, suggest: true })[0]?.result || [];
...
      // Show the top 5 results for each page
      const sectionResults = window.sectionIndex.search(query, 5, { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];

The other problem is that sectionResults is zero for most pages, so those pages never get put into the results.

thanks, will look into this

imfing avatar Jun 30 '25 17:06 imfing

maybe we could make the number of results configurable? here's a quick diff by Claude:

diff --git a/assets/js/flexsearch.js b/assets/js/flexsearch.js
index 33c107b..14e8596 100644
--- a/assets/js/flexsearch.js
+++ b/assets/js/flexsearch.js
@@ -318,7 +318,11 @@ document.addEventListener("DOMContentLoaded", function () {
     }
     resultsElement.classList.remove('hx:hidden');
 
-    const pageResults = window.pageIndex.search(query, 5, { enrich: true, suggest: true })[0]?.result || [];
+    // Configurable search limits with sensible defaults
+    const maxPageResults = parseInt('{{- site.Params.search.flexsearch.maxPageResults | default 20 -}}', 10);
+    const maxSectionResults = parseInt('{{- site.Params.search.flexsearch.maxSectionResults | default 10 -}}', 10);
+
+    const pageResults = window.pageIndex.search(query, maxPageResults, { enrich: true, suggest: true })[0]?.result || [];
 
     const results = [];
     const pageTitleMatches = {};
@@ -327,8 +331,8 @@ document.addEventListener("DOMContentLoaded", function () {
       const result = pageResults[i];
       pageTitleMatches[i] = 0;
 
-      // Show the top 5 results for each page
-      const sectionResults = window.sectionIndex.search(query, 5, { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];
+      // Show the top results for each page (configurable limit)
+      const sectionResults = window.sectionIndex.search(query, maxSectionResults, { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];
       let isFirstItemOfPage = true
       const occurred = {}
 
diff --git a/exampleSite/hugo.yaml b/exampleSite/hugo.yaml
index 453dc90..c2c8064 100644
--- a/exampleSite/hugo.yaml
+++ b/exampleSite/hugo.yaml
@@ -163,6 +163,10 @@ params:
       # full | forward | reverse | strict
       # https://github.com/nextapps-de/flexsearch/#tokenizer-prefix-search
       tokenize: forward
+      # Maximum number of pages to search (default: 20)
+      maxPageResults: 20
+      # Maximum number of sections per page to search (default: 10)
+      maxSectionResults: 10

imfing avatar Jul 03 '25 00:07 imfing

Thanks. That increases the limits, but there is still the problem that for most of the pages containing the word being searched, sectionResults (the result of window.sectionIndex.search) is a zero-length array.

In my example, pageResults now has all of expected posts that contain the word "firefox", but only a small fraction of those have a non-empty sectionResults. Even for those that have a non-empty sectionResults, those results don't include all of the paragraphs that contain "firefox". So something is wrong with the way the sectionIndex is constructed or searched.

bloovis avatar Jul 03 '25 12:07 bloovis

There is still something rather strange about searching sectionIndex. If I set maxPageResults to 50, and maxSectionResults: to 3, the number of results for the search of pageIndex is 20 in my "firefox" example, which is good. But the total number of results for the search of sectionIndex is 3. So maxSectionResults is not a per-page limit, but a limit on all section results for all pages. So in my example, the total number of pages seen in the search results is 3, even though there were 20 pages found. If I set maxSectionResults to a large number, say 50, this problem doesn't occur.

Below is an edited console log for this example. You can see that there are 20 page results, but only 3 of those pages have non-zero section results: pages 0, 4, and 5.

search for  firefox en.search.js:311:13
No. of pageResults =  20 en.search.js:333:13
pageResult[ 0 ] =  Object { id: 43, doc: {…} } , pageId =  page_43 
pageResult[ 1 ] =  Object { id: 60, doc: {…} } , pageId =  page_60 
Zero sectionResults for Object { id: 60, doc: {…} }
pageResult[ 2 ] = Object { id: 132, doc: {…} } , pageId =  page_132
Zero sectionResults for Object { id: 132, doc: {…} }
pageResult[ 3 ] = Object { id: 148, doc: {…} } , pageId =  page_148
Zero sectionResults for Object { id: 148, doc: {…} }
pageResult[ 4 ] = Object { id: 152, doc: {…} } , pageId =  page_152
pageResult[ 5 ] = Object { id: 175, doc: {…} } , pageId =  page_175
pageResult[ 6 ] = Object { id: 190, doc: {…} } , pageId =  page_190
Zero sectionResults for Object { id: 190, doc: {…} }
pageResult[ 7 ] = Object { id: 227, doc: {…} } , pageId =  page_227
Zero sectionResults for Object { id: 227, doc: {…} }
pageResult[ 8 ] = Object { id: 228, doc: {…} } , pageId =  page_228
Zero sectionResults for Object { id: 228, doc: {…} }
pageResult[ 9 ] = Object { id: 236, doc: {…} } , pageId =  page_236
Zero sectionResults for Object { id: 236, doc: {…} }
pageResult[ 10 ] = Object { id: 160, doc: {…} } , pageId =  page_160
Zero sectionResults for Object { id: 160, doc: {…} }
pageResult[ 11 ] = Object { id: 149, doc: {…} } , pageId =  page_149
Zero sectionResults for Object { id: 149, doc: {…} }
pageResult[ 12 ] = Object { id: 226, doc: {…} } , pageId =  page_226
Zero sectionResults for Object { id: 226, doc: {…} }
pageResult[ 13 ] = Object { id: 85, doc: {…} } , pageId =  page_85
Zero sectionResults for Object { id: 85, doc: {…} }
pageResult[ 14 ] = Object { id: 39, doc: {…} } , pageId =  page_39
Zero sectionResults for Object { id: 39, doc: {…} }
pageResult[ 15 ] = Object { id: 47, doc: {…} } , pageId =  page_47
Zero sectionResults for Object { id: 47, doc: {…} }
pageResult[ 16 ] = Object { id: 49, doc: {…} } , pageId =  page_49
Zero sectionResults for Object { id: 49, doc: {…} }
pageResult[ 17 ] = Object { id: 147, doc: {…} } , pageId =  page_147
Zero sectionResults for Object { id: 147, doc: {…} }
pageResult[ 18 ] = Object { id: 159, doc: {…} } , pageId =  page_159
Zero sectionResults for Object { id: 159, doc: {…} }
pageResult[ 19 ] = Object { id: 69, doc: {…} } , pageId =  page_69
Zero sectionResults for Object { id: 69, doc: {…} }

bloovis avatar Jul 03 '25 19:07 bloovis

This issue in Flexsearch may be the problem, or at least related:

Search using tags may result in less results than expected #459

As a workaround, I'm using this hack in flexsearch.js: removing the limit in the the search of sectionIndex, then applying the limit to the loop on sectionResults:

...
      const sectionResults = window.sectionIndex.search(query, // maxSectionResults,
        { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];
      let isFirstItemOfPage = true
      const occurred = {}
      const nResults = Math.min(sectionResults.length, maxSectionResults);
      for (let j = 0; j < nResults; j++) {
        const { doc } = sectionResults[j]
...

bloovis avatar Jul 05 '25 12:07 bloovis

I also encounter this problem, Is there any way to customize the number and context length of searched items?

Vonng avatar Sep 08 '25 12:09 Vonng

For completeness, here is my patch for flexsearch.js.

--- ./assets/js/flexsearch.js	2025-08-31 13:21:21.778612466 -0700
*************** document.addEventListener("DOMContentLoa
*** 318,324 ****
      }
      resultsElement.classList.remove('hx:hidden');
  
!     const pageResults = window.pageIndex.search(query, 5, { enrich: true, suggest: true })[0]?.result || [];
  
      const results = [];
      const pageTitleMatches = {};
--- 318,327 ----
      }
      resultsElement.classList.remove('hx:hidden');
  
!     // Configurable search limits with sensible defaults
!     const maxPageResults = parseInt('{{- site.Params.search.flexsearch.maxPageResults | default 20 -}}', 10);
!     const maxSectionResults = parseInt('{{- site.Params.search.flexsearch.maxSectionResults | default 10 -}}', 10);
!     const pageResults = window.pageIndex.search(query, maxPageResults, { enrich: true, suggest: true })[0]?.result || [];
  
      const results = [];
      const pageTitleMatches = {};
*************** document.addEventListener("DOMContentLoa
*** 327,338 ****
        const result = pageResults[i];
        pageTitleMatches[i] = 0;
  
!       // Show the top 5 results for each page
!       const sectionResults = window.sectionIndex.search(query, 5, { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];
        let isFirstItemOfPage = true
        const occurred = {}
  
!       for (let j = 0; j < sectionResults.length; j++) {
          const { doc } = sectionResults[j]
          const isMatchingTitle = doc.display !== undefined
          if (isMatchingTitle) {
--- 330,342 ----
        const result = pageResults[i];
        pageTitleMatches[i] = 0;
  
!       const sectionResults = window.sectionIndex.search(query,
!         { enrich: true, suggest: true, tag: { 'pageId': `page_${result.id}` } })[0]?.result || [];
        let isFirstItemOfPage = true
        const occurred = {}
  
!       const nResults = Math.min(sectionResults.length, maxSectionResults);
!       for (let j = 0; j < nResults; j++) {
          const { doc } = sectionResults[j]
          const isMatchingTitle = doc.display !== undefined
          if (isMatchingTitle) {

bloovis avatar Sep 08 '25 14:09 bloovis

Would it make sense to expose some parameters in the FlexSearch config to adjust its behavior?

imfing avatar Sep 10 '25 21:09 imfing

I used your suggestion to make parameters for maxPageResults and maxSectionResults. Are you talking about other parameters besides those two?

Here's the relevant section from my hugo.yaml:

params:
  search:
    enable: true
    type: flexsearch

    flexsearch:
      # index page by: content | summary | heading | title
      index: content
      # Maximum number of pages to search (default: 20)
      maxPageResults: 50
      # Maximum number of sections per page to search (default: 10)
      maxSectionResults: 3

bloovis avatar Sep 10 '25 21:09 bloovis

I used your suggestion to make parameters for maxPageResults and maxSectionResults. Are you talking about other parameters besides those two?

Here's the relevant section from my hugo.yaml:

params: search: enable: true type: flexsearch

flexsearch:
  # index page by: content | summary | heading | title
  index: content
  # Maximum number of pages to search (default: 20)
  maxPageResults: 50
  # Maximum number of sections per page to search (default: 10)
  maxSectionResults: 3

Got it.

I’ll add these config options when I get a chance. In the meantime, feel free to open a PR if you’d like to contribute directly.

imfing avatar Sep 10 '25 21:09 imfing