pagefind
pagefind copied to clipboard
Boolean search operators / search syntax documentation
Is there any formal documentation on the search syntax supported?
Like, is the exclusive "AND" operator supported? If I search for "term1 term2", pagefind seems to treat all searches like all the terms are ORed, so a result will contain at least one result, and maybe others if I'm lucky.
How would I tell pagefind to only return results that contain all the keywords?
No formal syntax has been implemented yet — it's something I'm hoping to do before a 1.0 release but I can't guarantee I'll get to it. There's a small conversation about this in #70 but no work has been started.
For some context on the current state:
The current search strategy could be thought of as "best effort". Specifically in your case, term1 term2
will be treated as term1 AND term2
if both words exist in the corpus — so Pagefind will bias to showing only the most specific pages in the case that it recognizes both words.
If one of the two words isn't found anywhere in the search index, then that word will be ignored. So in this case if term2
doesn't exist anywhere on the site being indexed, then Pagefind will execute the search as simply term1
. In this sense it's biased toward returning some results, rather than none.
There shouldn't be a case where you see term1 term2
returning ORed results — let me know if this is definitely happening. I can't see a way this would be getting through the current search function, though. The excerpts generated sometimes aren't the best, and won't contain both words, so sometimes the matches might look worse than reality. Another explanation is that Pagefind does search all word extensions, so term1 term2
will also return a page containing term1
and term22
.
Hopefully that context helps! In summary
How would I tell pagefind to only return results that contain all the keywords?
As long as both keywords exist (and aren't common prefixes) then this is the current behaviour. But I am keen on supporting a more formal search documentation 🙂
First of all, I would like to thank the authors of Pagefind for this really easy to use search-tool!
I stumbled upon this issue because I also thought that Pagefind does not have an AND
condition -- this perception is obviously wrong, as illustrated by above answer from bglw.
What is "missing", though, is to specify word groups, i.e., a sequence of two or more words to search for and require that they be found together. For example, for the famous sentence in Shakespeare's Hamlet:
To be, or not to be, that is the question
it is difficult to find to
and be
. It is the combination of those two words, which make them stand out. So what might be needed is searching for something like to+be
, or that+is+the+question
.
Also see Pagefind: Searching in Static Sites. As stated there, it is not a pressing issue, and mostly not important for technical blogs.
👋 Hey @eklausme!
Yes, that kind of adjacency would be great! Ideally, I would like Pagefind to take that into account by default. Given a plain search for to be
, pages where those words are close or adjacent should rank higher than pages where those words are paragraphs apart.
That data does already exist when searching — if you search for "to be"
in quotes you'll see only pages with those words adjacent are in the results. To do the better generic ranking, it's just a matter of finding a good algorithm to calculate that ranking, given Pagefind's available data, without blowing out the search performance.
Not something I have had time for yet, but hopefully will one day! 🙂
I'm using Pagefind to show a list of related articles using the current article's tags. Problem is, it only shows articles that have exactly the same tags as the one being viewed. I've solved it by reducing the keyword set until Pagefind returns results. A fuzzy search matching, or one based on OR would be great though.
@leancept if you're showing a list based on a known set of tags, then filtering sounds like a good path that does support this :)
https://pagefind.app/docs/js-api-filtering/#using-compound-filters
You would be able to do something like:
await pagefind.search(null, {
filters: {
tag: {
any: ["tag one", "tag_two", "tag_three"]
}
},
});