newsbeuter
newsbeuter copied to clipboard
Dealing with dead feeds
Sometimes feeds die, i.e. URLs that used to work start producing HTTP errors (most commonly "404 Not Found"). Newsbeuter should probably provide some way of clearing those from urls
file or something.
The tricky part is distinguishing between dead feeds and ones that just "flicked", i.e. errored for a short period of time.
Maybe keep a per-feed counter of consecutive errors and add something like "(Note: feed seems to be dead)" in update message when some threshold is passed?
Judging by the comment in the documentation, error-log
was meant to be a destination for such messages, but looking at the code, it seems errorlog is for errors caused by the users. Confusing.
In #484, @heavyimage asks:
maybe a way of filtering to find feeds with no articles?
This will probably be helpful only on first reload. Once we got at least one article from a feed, it will be counted as alive with this rule. Or am I misunderstanding? Please expand the idea, then.
I definitely don't think feeds that were working but then stopped (eg: there are still articles from that feed that haven't been deleted) should be cleared or tracked automatically; this could be 'flicking' or it could be that a user wants to hold onto an update form a long-dead feed.
I think the two situations you'd want to account for (eg: handle as 'dead feeds') are:
-
A feed has never worked (the situation I'm in now starting to use newsbeuter with an ancient list of RSS feeds most of which are dead). In this case, the url list is loaded in and many feeds are set to 0/0 with no history of the feed's articles in the DB. This seem like easy candidates for pruning / deleting.
-
A feed has stopped working (and no articles remain in the read/unread state -- eg, everything has been deleted. In this case, you might have to rely on some sort of counter to track the number of failures to respond and after some threshold as the user if they want to remove them.
I'll confess I don't know that much about newsbeuter's internals (yet) -- just giving my $0.02. Curious what everyone thinks! Jesse
Just displaying the title of a failing feed in red or something like that would be a nice and welcome improvement.
@heavyimage Your point No.1 is troublesome because at that stage, we can't know if the feed is really dead or simply happened to "flicker" just then.
No.2 is pretty much what I'm thinking of right now, but I don't see why I should take the state of articles in the feed into account. To be clear: "dead" feeds won't be removed automatically, they'll be just marked (like @hjassa suggests) and/or logged so that user can take action.
I'm not sure it makes sense to actually ask users if they want to remove a dead feed:
- it's distracting: user was updating their feeds just now and Newsbeuter tries to detract them into managing their feeds instead;
- dead feeds produce no content, so keeping them around is cheap.
Newbie here, so I'm happy to admit that I don't fully understand all of the internal working and know the development history with newsbetuer.
What if we added two fields (for this example "error", and "dead") in the cache db in the rss_feed table. On the first failure of a feed, the current time would be entered into "error". For each subsequent feed reload, if that feed continues to error out, then we don't touch "error", but if the feed successfully reloads at some point in the future, then the "error" gets set to 0. The max number of failure days/weeks would be set in the config file for user notification. If this threshold is hit by a feed during a reload the "dead" field would be set to 1. We could then avoid feed "flicker" by giving ourselves an acceptable window of failure. The reason that I'm suggesting error time instead of last successful reload is that we would cut down on db insert queries. With this method, I could be trying to over optimize for large collections of feeds.
In terms of user action on deleting feeds, what if the header was modified to: "Your Feeds (# unread, # total, # dead). By default users would utilize a filter on the "dead" field to retrieve feed info and then modify the url file themselves. Or for those who wish to have dead feed pruning handled automatically we add a config option for "auto-delete-dead-feeds" which would require the "max number of failures" option to be set. Once that threshold has been met, the feed is wiped from the url file automatically. This way users have the choice of letting newsbeuter fully manage feeds or still have manual control of their feed list.
It's not the most elegant solution, but it would hit all of the criteria that has been discussed already.
All good points, @dd217; thanks!
I think using time since last successful fetch instead of number of errors is an especially good idea because it's independent of the refresh rate and it's more generally a more natural way to express the idea of a "grace period".