readability.rs icon indicating copy to clipboard operation
readability.rs copied to clipboard

Does this work?

Open Anonyfox opened this issue 7 years ago • 1 comments

Hey, I just stumbled upon this repo, and it seems that you have ported the famous readability algorithm into rust, using kuchiki and therefore html5ever. First: truly great!

But it seems that this algo does crash when used on actual HTML websites, I get panics like

1:        0x10a9bc24c - std::sys::imp::backtrace::tracing::imp::write::hf587afb8e94ad165
   2:        0x10a9be23e - std::panicking::default_hook::{{closure}}::haf3443cb412055ce
   3:        0x10a9bdde3 - std::panicking::default_hook::h742f925bfab3bbfa
   4:        0x10a9be6f7 - std::panicking::rust_panic_with_hook::h6f06ff8d28a94df6
   5:        0x10a9be5a4 - std::panicking::begin_panic::h7b9167ba3324cfae
   6:        0x10a9be4c2 - std::panicking::begin_panic_fmt::hb5f8f1fe0fe23e28
   7:        0x10a9be427 - rust_begin_unwind
   8:        0x10a9e5e60 - core::panicking::panic_fmt::he6eb92dab4407c61
   9:        0x10a9e5eed - core::option::expect_failed::hf8bba00a6e833438
  10:        0x10a70f373 - <core::option::Option<T>>::expect::hba43ec4f65591df2
  11:        0x10a6cf697 - <std::collections::hash::map::HashMap<K, V, S> as core::ops::Index<&'a Q>>::index::he1febf3b2b851612
  12:        0x10a782795 - readability::Readability::add_info::h3257b725054a9642
  13:        0x10a782026 - readability::Readability::readify::h110ae48756961de8
  14:        0x10a781a7a - readability::Readability::parse::h69c7871f90548046

Maybe this repo needs also some small polish, like publishing on crates.io and a README with a short "how to use". I just figured out that

readability::new().parse(&html_string).text_contents()

works more or less to get started, but I tinkered with kuchiki before. Do you want some help? I might not be of good use for the algorithmic side in Rust yet, but when you have a working state of this crate I'd like to write some docs for you in exchange. What dou you think?

Anonyfox avatar Feb 25 '17 18:02 Anonyfox

Hi, thank you for your attention! I plan to go back to the project next month (I need it in my degree work). I will need to port mozilla's tests and some heuristics to improve precision. Also it's good to abstract the library over any DOM, not only kuchiki.

... actual HTML websites, I get panics like

Can you provide me a webpage that you used when you got this error?

loyd avatar Feb 28 '17 12:02 loyd