A CSS selector like `:not(.<class> <anything>)` crashes Pagefind
Description
If you use a selector like --exclude_selectors "body :not(.include_only_this_tag *) then PageFind parser crashes.
In fact, any selector with a descendant selector (space) inside a :not makes pagefind crash.
Reproducibility
One can run PageFind with one of those selectors. The site used is not relevant. The crash seen is the following.
thread 'main' panicked at pagefind/src/fossick/parser.rs:486:39:
called `Result::unwrap()` on an `Err` value: UnexpectedToken
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Backtrace
thread 'main' panicked at pagefind/src/fossick/parser.rs:486:39:
called `Result::unwrap()` on an `Err` value: UnexpectedToken
stack backtrace:
0: __rustc::rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: pagefind::fossick::parser::DomParser::new
4: <futures_util::stream::futures_ordered::OrderWrapper<T> as core::future::future::Future>::poll
5: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
6: <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
7: <futures_util::stream::stream::collect::Collect<St,C> as core::future::future::Future>::poll
8: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
9: pagefind::SearchState::fossick_many::{{closure}}
10: pagefind::runner::run_indexer::{{closure}}
11: pagefind::main::{{closure}}
12: tokio::runtime::park::CachedParkThread::block_on
13: tokio::runtime::context::runtime::enter_runtime
14: tokio::runtime::runtime::Runtime::block_on
15: pagefind::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Software Version
pagefind 1.4.0
Hi, thanks for the report and the reproduction!
The parser Pagefind uses doesn't support the :not() selector, but this shouldn't panic. We should instead throw a warning to the console here but otherwise continue.
Actually, :not(.<class>) does not panic.
But :not(.<class> <anything>) does.
I will also explain why I was using :not.
Basically, I am plugin in pagefind into a static site where I cannot modify the generation process freely.
I wanted to build an index only for content included within elements of certain class (what data-pagefind-body does).
However, there is no setting for that, so I tried to exclude everything that is not under an element of that class.
Actually, :not(.
) does not panic. But :not(. ) does.
Thanks! I'll ensure this is covered in a test case.
Yes, Pagefind is a little tricky if you can't modify the generation. At some point we'll expose more configuration to index a site with selectors, but currently only the root selector can be configured this way.
At some point we'll expose more configuration to index a site with selectors.
I think that would be quite useful and maybe easy to do. Having a full CSS parser for the exclude_selectors setting would also be powerful enough, but it is obviously harder.