pagefind icon indicating copy to clipboard operation
pagefind copied to clipboard

A CSS selector like `:not(.<class> <anything>)` crashes Pagefind

Open ramcasa opened this issue 7 months ago • 4 comments

Description

If you use a selector like --exclude_selectors "body :not(.include_only_this_tag *) then PageFind parser crashes. In fact, any selector with a descendant selector (space) inside a :not makes pagefind crash.

Reproducibility

One can run PageFind with one of those selectors. The site used is not relevant. The crash seen is the following.

thread 'main' panicked at pagefind/src/fossick/parser.rs:486:39:
called `Result::unwrap()` on an `Err` value: UnexpectedToken
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Backtrace

thread 'main' panicked at pagefind/src/fossick/parser.rs:486:39:
called `Result::unwrap()` on an `Err` value: UnexpectedToken
stack backtrace:
   0: __rustc::rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: pagefind::fossick::parser::DomParser::new
   4: <futures_util::stream::futures_ordered::OrderWrapper<T> as core::future::future::Future>::poll
   5: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
   6: <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
   7: <futures_util::stream::stream::collect::Collect<St,C> as core::future::future::Future>::poll
   8: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
   9: pagefind::SearchState::fossick_many::{{closure}}
  10: pagefind::runner::run_indexer::{{closure}}
  11: pagefind::main::{{closure}}
  12: tokio::runtime::park::CachedParkThread::block_on
  13: tokio::runtime::context::runtime::enter_runtime
  14: tokio::runtime::runtime::Runtime::block_on
  15: pagefind::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Software Version

pagefind 1.4.0

ramcasa avatar Sep 04 '25 16:09 ramcasa

Hi, thanks for the report and the reproduction!

The parser Pagefind uses doesn't support the :not() selector, but this shouldn't panic. We should instead throw a warning to the console here but otherwise continue.

bglw avatar Sep 08 '25 02:09 bglw

Actually, :not(.<class>) does not panic. But :not(.<class> <anything>) does.

I will also explain why I was using :not. Basically, I am plugin in pagefind into a static site where I cannot modify the generation process freely. I wanted to build an index only for content included within elements of certain class (what data-pagefind-body does). However, there is no setting for that, so I tried to exclude everything that is not under an element of that class.

ramcasa avatar Sep 09 '25 09:09 ramcasa

Actually, :not(.) does not panic. But :not(. ) does.

Thanks! I'll ensure this is covered in a test case.

Yes, Pagefind is a little tricky if you can't modify the generation. At some point we'll expose more configuration to index a site with selectors, but currently only the root selector can be configured this way.

bglw avatar Sep 09 '25 09:09 bglw

At some point we'll expose more configuration to index a site with selectors.

I think that would be quite useful and maybe easy to do. Having a full CSS parser for the exclude_selectors setting would also be powerful enough, but it is obviously harder.

ramcasa avatar Sep 09 '25 10:09 ramcasa