Using ignore-article to filter out articles from specific feeds using tags
newsboat 2.10.2 System: Linux 4.14.13-1-ARCH (x86_64) Compiler: g++ 7.2.1 20171215 ncurses: ncurses 6.0.20170902 (compiled with 6.0) libcurl: libcurl/7.57.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.29.0 (compiled with 7.57.0) SQLite: 3.21.0 (compiled with 3.21.0) libxml2: compiled with 2.9.7
I would like to be able to apply ignore-article to articles from feeds with a specific tag. I subscribe to 52 RSS feeds which are all tagged "tag1" and want to filter out articles from "tag1" feeds that contain certain words in their titles. I also subscribe to lots of other feeds that I do not necessarily want to filter in the same way.
It seems natural to specify "tag1" in ignore-article to make sure I am only filtering articles from these "tag1" feeds. However, using the tags attribute (from Table 5 of the newsboat docs) in an ignore-article filter phrase results in all articles in all feeds being filtered out, presumably because tags is not a valid attribute for ignore-article: this also happens if, instead of tags, I enter a random string as an attribute. It appears that all of the attributes in Table 5 of the newsboat docs which have the context feed, article (the bottom 9) are not valid attributes for ignore-article.
To put it simply, the only way I can find to apply ignore-article to a subset of my feeds is to create an ignore-article entry for each feed individually. In my opinion, it would make a lot more sense if users could specify a tag in one of two ways:
-
ignore-article "tag1" "title =~ \"filtered-phrase\"" -
ignore-article "*" "tags # \"tag1\" and title =~ \"filtered-phrase\""
Steps to reproduce the issue:
- Add
ignore-article "*" "tags # \"tag1\" and (title =~ \"filtered-phrase\")"to the config file. - Start newsboat.
Expected: articles from feeds tagged "tag1" which contain "filtered-phrase" in the title are filtered out. Actual: all articles in all feeds are filtered out.
Loving newsboat, thanks for your work!
Hi! Thanks for taking the time to report this.
I reproduced this with current master (a2844a747). Debug log shows that you're right and ignore fails because of a missing tags attribute. I'll try to look deeper into it over the weekend.
JFTR: ignore-article "*" "tags # \"tag1\"" is enough to reproduce the issue.
Okay, so here's what I got so far. The problem turned out to be twofold:
-
when we load an RSS item from the database and call
rss_ignores::matcheson it, the item doesn't yet have a pointer to its parent feed. Thus the item can't delegate attribute lookup, and as a result, has no "tags" attribute associated with it.This is very easy to fix by adding a call to
rss_item::set_feedptrsomewhere inrssitem_callback. (But this fix leads to a segfault in a test suite—need to investigate that); -
even if the item delegates attribute lookup, "tags" attribute will still be empty since the feed we're operating on doesn't contain anything. In fact,
cacheknows nothing of tags, and never bothers with them—it'scontrollerwho sets them time and again after cache has done its work.This is trickier. I'd rather not pass tags to
cache::internalize_rssfeed, because it shifts too much logic into thecacheclass. Perhaps I need a separate class that will consult bothcacheandurlreaderto produce the finalrss_feed?
I need to sleep on it, and I also don't know how much free time I'll have during the workweek, so maybe this will have to wait until the weekend again. Thank you for the patience!
Thanks for looking into this @Minoru. Were you able to make progress on a solution? Solution 1 seems like the logical way to proceed in my opinion, but the segfault is obviously problematic.
I dropped the ball—sorry. Items 1 and 2 aren't alternatives, they just deal with different parts of the problem. If I deploy only item 1 (provided I solve the segfault problem), nothing will change for the users—items will have pointers to parent feed, but parent feed doesn't have any tags either, so tag lookup will continue to fail.
I hesitate to promise any new deadline, but I'll definitely keep this issue updated with my progress.
So, any progress?
@Jorengarenar, nope. I'm busy with other parts of the codebase at the moment, and have no idea when I'll return to this issue.
Do you know of any post-fetch solutions?
@Jorengarenar I didn't test this, but I believe you can set ignore-mode "display" and add one ignore-article rule per feed.
No luck with that
If this gets fixed we should update the documentation to mention the feature in section "5.6. Killfiles". It's particular useful if we could support the expression syntax, even just being able to specify multiple feeds or tags, would be a welcome improvement.
This is trickier. I'd rather not pass tags to
cache::internalize_rssfeed, because it shifts too much logic into thecacheclass. Perhaps I need a separate class that will consult bothcacheandurlreaderto produce the final rss_feed?
What about moving the item-filtering code into Controller?
@the-blank-x Controller is a large class, so I'd prefer not to add to it.
Any progress? I fall into the same situation here.
If this gets fixed we should update the documentation to mention the feature in section "5.6. Killfiles".
Indeed. Filtering out by tags and title are probably the most common cases. It must be exemplified in the documentation.
@tapyu Nothing has been done about this yet. PRs welcome ;)
@Minoru I would love to contibute, but there is a huge barrier since news aggregators and internet protocols are not my expertise area, at all. Is there a newbie-friendly community where I can learn more about it and, who knows, contribute?
@tapyu If there are such communities, I'm not aware of them. What I did myself is I started working on bugs that I understood how to reproduce, and read up on relevant standards when I ran into protocol-related bugs.
This particular bug is all about Newsboat's internals though, I don't think there's anything one needs to read except for Newsboat's code. Take a look at https://github.com/newsboat/newsboat/issues/101#issuecomment-359278782; the first item there seems doable by a newcomer, and would make it possible to work on the second item.