russ icon indicating copy to clipboard operation
russ copied to clipboard

HTML text extraction

Open mntn-xyz opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Feature description

Some RSS feeds only include a small snippet of the article, or sometimes nothing at all. I've used other RSS readers that automatically extract the text of articles, usually on a feed-by-feed basis. It would be great to see this in russ as it really helps for offline use.

I'd suggest rust-html2text as it's built in Rust, is actively developed, and it is built on Servo which is under active development again.

mntn-xyz avatar Jan 28 '24 17:01 mntn-xyz

Upon further reflection, this could probably just be done as a scripted post-processing step, which leads me to wonder if (as an alternative) russ could just include a way to run a post-processing command for a given feed. I will open a different issue for that.

mntn-xyz avatar Jan 28 '24 17:01 mntn-xyz

@mntn-xyz is your use case similar to the work done in this PR? https://github.com/ckampfe/russ/pull/34

ckampfe avatar Jun 01 '24 22:06 ckampfe

Yes, it looks like this PR would suffice. I still think rust-html2text might be a better option as it offers more configuration, but anything that provides scraping would meet the use case.

mntn-xyz avatar Aug 30 '24 02:08 mntn-xyz