pagefind icon indicating copy to clipboard operation
pagefind copied to clipboard

Boolean search operators / search syntax documentation

Open chrisspen opened this issue 1 year ago • 5 comments

Is there any formal documentation on the search syntax supported?

Like, is the exclusive "AND" operator supported? If I search for "term1 term2", pagefind seems to treat all searches like all the terms are ORed, so a result will contain at least one result, and maybe others if I'm lucky.

How would I tell pagefind to only return results that contain all the keywords?

chrisspen avatar Jun 25 '23 20:06 chrisspen

No formal syntax has been implemented yet — it's something I'm hoping to do before a 1.0 release but I can't guarantee I'll get to it. There's a small conversation about this in #70 but no work has been started.

For some context on the current state:

The current search strategy could be thought of as "best effort". Specifically in your case, term1 term2 will be treated as term1 AND term2 if both words exist in the corpus — so Pagefind will bias to showing only the most specific pages in the case that it recognizes both words.

If one of the two words isn't found anywhere in the search index, then that word will be ignored. So in this case if term2 doesn't exist anywhere on the site being indexed, then Pagefind will execute the search as simply term1. In this sense it's biased toward returning some results, rather than none.

There shouldn't be a case where you see term1 term2 returning ORed results — let me know if this is definitely happening. I can't see a way this would be getting through the current search function, though. The excerpts generated sometimes aren't the best, and won't contain both words, so sometimes the matches might look worse than reality. Another explanation is that Pagefind does search all word extensions, so term1 term2 will also return a page containing term1 and term22.

Hopefully that context helps! In summary

How would I tell pagefind to only return results that contain all the keywords?

As long as both keywords exist (and aren't common prefixes) then this is the current behaviour. But I am keen on supporting a more formal search documentation 🙂

bglw avatar Jun 26 '23 07:06 bglw

First of all, I would like to thank the authors of Pagefind for this really easy to use search-tool!

I stumbled upon this issue because I also thought that Pagefind does not have an AND condition -- this perception is obviously wrong, as illustrated by above answer from bglw.

What is "missing", though, is to specify word groups, i.e., a sequence of two or more words to search for and require that they be found together. For example, for the famous sentence in Shakespeare's Hamlet:

To be, or not to be, that is the question

it is difficult to find to and be. It is the combination of those two words, which make them stand out. So what might be needed is searching for something like to+be, or that+is+the+question.

Also see Pagefind: Searching in Static Sites. As stated there, it is not a pressing issue, and mostly not important for technical blogs.

eklausme avatar Oct 24 '23 14:10 eklausme

👋 Hey @eklausme!

Yes, that kind of adjacency would be great! Ideally, I would like Pagefind to take that into account by default. Given a plain search for to be, pages where those words are close or adjacent should rank higher than pages where those words are paragraphs apart.

That data does already exist when searching — if you search for "to be" in quotes you'll see only pages with those words adjacent are in the results. To do the better generic ranking, it's just a matter of finding a good algorithm to calculate that ranking, given Pagefind's available data, without blowing out the search performance.

Not something I have had time for yet, but hopefully will one day! 🙂

bglw avatar Oct 25 '23 21:10 bglw

I'm using Pagefind to show a list of related articles using the current article's tags. Problem is, it only shows articles that have exactly the same tags as the one being viewed. I've solved it by reducing the keyword set until Pagefind returns results. A fuzzy search matching, or one based on OR would be great though.

leancept avatar Nov 22 '23 14:11 leancept

@leancept if you're showing a list based on a known set of tags, then filtering sounds like a good path that does support this :)

https://pagefind.app/docs/js-api-filtering/#using-compound-filters

You would be able to do something like:

await pagefind.search(null, {
    filters: {
        tag: {
            any: ["tag one", "tag_two", "tag_three"]
        }
    },
});

bglw avatar Nov 23 '23 04:11 bglw