pagefind icon indicating copy to clipboard operation
pagefind copied to clipboard

Issue with indexing and text that contains underscores

Open demetris opened this issue 1 year ago • 2 comments

Hello, @bglw and all the good people at CloudCannon!

I am building a small site for documenting WooCommerce action hooks and filter hooks. The pages are named after the hook they document, and they have titles like this:

  • woocommerce_init
  • woocommerce_loaded
  • woocommerce_ajax_get_endpoint
  • etc.

So, the site has a page titled woocommerce_init as well as a page titled woocommerce_loaded. But when I search for init or loaded, Pagefind finds nothing:

20240804-1-pagefind-woocommerce_loaded-annotated

When I search using the full title of the page, e.g., woocommerce_init or woocommerce_loaded, Pagefind finds the pages. It also finds the pages when I search using the full title without the underscores, e.g., woocommerce init or woocommerce loaded:

20240804-2-pagefind-woocommerce_loaded-annotated

20240804-3-pagefind-woocommerce_loaded-annotated

If I rename the page titles to use hyphens instead of underscores (e.g., rename woocommerce_loaded to woocommerce-loaded) and reindex the site, Pagefind gives me the results I expect:

20240804-4-pagefind-woocommerce-loaded-with-hyphen-annotated

Do you know why this happens or if it’s something I can fix on my end?

In case it matters, the site is built with Astro. It is live and the pages in my examples can be accessed here:

Cheers!

demetris avatar Aug 04 '24 11:08 demetris

Hi @demetris 👋

Interesting! Going to your link I can see the same behavior. I'm unsure why, I'll need to look into that.

Searching for loaded should indeed match woocommerce_loaded — and we have integration tests that ensure that — so there must be some confounding factor with this content in particular. I'll take a look soon 👀

bglw avatar Aug 15 '24 01:08 bglw

Thank you, @bglw.

Looking forward to seeing what you find.

demetris avatar Aug 21 '24 05:08 demetris