v2 icon indicating copy to clipboard operation
v2 copied to clipboard

Per-domain scrape rule

Open yegle opened this issue 6 years ago • 6 comments

Google News RSS feed contains link to different sites and they all need different scrape rule.

Example of such an RSS feed: https://news.google.com/news/rss/headlines/section/geo/SanFrancisco

yegle avatar Feb 10 '19 19:02 yegle

I guess what I actually want is https://github.com/miniflux/miniflux/blob/master/reader/scraper/rules.go but as a flag or something that doesn't require upstreaming the changes first and wait for the next release.

yegle avatar Feb 10 '19 19:02 yegle

I guess we can have an ENV, like: SCRAPER_RULES="path/to/rules.json",to make it configurable to users.

qjebbs avatar Feb 12 '19 01:02 qjebbs

You can still define scraper rules for each feed via the user interface (edit feed page).

fguillot avatar Feb 12 '19 06:02 fguillot

Yes you can define scrape rule per-feed but not per-domain. If you check the RSS feed in the original post it contains posts from different domain.

yegle avatar Feb 12 '19 07:02 yegle

Ok, I see.

fguillot avatar Feb 13 '19 00:02 fguillot

Not sure this is better suited to something like RSS Bridge. This gets hairy fast if you try to correctly parse the entire Internet.

https://github.com/RSS-Bridge/rss-bridge

somini avatar Jan 30 '20 02:01 somini