11ty-website icon indicating copy to clipboard operation
11ty-website copied to clipboard

Search fails to find addDateParsing in the docs

Open darthmall opened this issue 8 months ago • 4 comments

If I do a search for “addDateParsing”, I get 121 results. The Content Dates page, which contains the actual string “addDateParsing” does not show up until result 109, and does not actually match on the query “addDateParsing”, it just matches on “add”.

If I quote the query so it is limited to exact matches, I get 0 results. Using a site:11ty.dev search on DuckDuckGo returns the correct results (the Content Dates page is the first hit, with addDateParsing highlighted).

Not sure if this is a problem specific to the query “addDateParsing” or if this is symptomatic of a larger issue with search on the site.

darthmall avatar Jul 10 '25 16:07 darthmall

@uncenter has also surfaced a similar issue with setInputDirectory. It may be a pagefind issue with code blocks or how the tokens in code blocks are parsed in pagefind. I don’t believe we have anything in place to exclude those things (we use data-pagefind-ignore in a few places, like the navigation menu)

I might do a cheeky ping to @bglw to (very optionally) weigh in with some larger expertise there?

zachleat avatar Jul 14 '25 15:07 zachleat

My naive guess is that it has something to do with what selectors are being 'targeted' (I don't actually really understand how this indexer works...) in https://github.com/Pagefind/pagefind/blob/d7d0b3a0f0eb12661cd2eb894ad02f10687a4ca0/pagefind/src/fossick/parser.rs#L23-L30. I don't see pre, though I do see code. Will a span with text within a pre > code be indexed if pre isn't included?

In my case for setInputDirectory the content I'm looking for is in a code inline element within a td element of a table (https://www.11ty.dev/docs/config/#input-directory) - this is a little more confusing since it seems td is already in the list of selectors there. Might be worth noting that the sibling element after the code does have data-pagefind-ignore?

<td><code>eleventyConfig.setInputDirectory()</code> <span data-pagefind-ignore="" eleventy:id-ignore="" class="minilink minilink-addedin" data-uncoerced-version="3.0.0-alpha.6">Added in v3.0.0</span></td>

uncenter avatar Jul 14 '25 17:07 uncenter

Hello! Sorry, I've been in email debt for a little while so I missed this one 😄 But I'm finding some space to look at Pagefind things again.

This rings a bell! It's a lovely wee quirk with compound words. For a fun one, you get much better results searching for add date parsing with spaces :(

eleventyConfig.addDateParsing is one big compound word, which pagefind indexes as:

  • eleventyconfigadddateparsing
  • eleventy
  • config
  • add
  • date
  • parsing

When you just search for addDateParsing it doesn't find any word (since Pagefind is a prefix search), so it winds up cutting the search term all the way back to add, at which point it finds it in there as add (but not addDateParsing, since now it broke the word up too much).

It's a bit of index bloat covering all the permutations, but it's probably wise to also index eleventyConfig and addDateParsing there either side of the period. (You still have an issue where DateParsing doesn't match, though).

The other idea is to split up the search term in the same way, so searching for addDateParsing also somehow rolls in a search for add date parsing. This idea is iffier in a good-idea-for-everyone sense.

bglw avatar Jul 26 '25 11:07 bglw

Another one to add to the list: addUrlTransform.

darthmall avatar Aug 12 '25 02:08 darthmall