newsboat icon indicating copy to clipboard operation
newsboat copied to clipboard

Using ignore-article to filter out articles from specific feeds using tags

Open shanewstone opened this issue 8 years ago • 18 comments

newsboat 2.10.2 System: Linux 4.14.13-1-ARCH (x86_64) Compiler: g++ 7.2.1 20171215 ncurses: ncurses 6.0.20170902 (compiled with 6.0) libcurl: libcurl/7.57.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.29.0 (compiled with 7.57.0) SQLite: 3.21.0 (compiled with 3.21.0) libxml2: compiled with 2.9.7


I would like to be able to apply ignore-article to articles from feeds with a specific tag. I subscribe to 52 RSS feeds which are all tagged "tag1" and want to filter out articles from "tag1" feeds that contain certain words in their titles. I also subscribe to lots of other feeds that I do not necessarily want to filter in the same way.

It seems natural to specify "tag1" in ignore-article to make sure I am only filtering articles from these "tag1" feeds. However, using the tags attribute (from Table 5 of the newsboat docs) in an ignore-article filter phrase results in all articles in all feeds being filtered out, presumably because tags is not a valid attribute for ignore-article: this also happens if, instead of tags, I enter a random string as an attribute. It appears that all of the attributes in Table 5 of the newsboat docs which have the context feed, article (the bottom 9) are not valid attributes for ignore-article.

To put it simply, the only way I can find to apply ignore-article to a subset of my feeds is to create an ignore-article entry for each feed individually. In my opinion, it would make a lot more sense if users could specify a tag in one of two ways:

  • ignore-article "tag1" "title =~ \"filtered-phrase\""
  • ignore-article "*" "tags # \"tag1\" and title =~ \"filtered-phrase\""

Steps to reproduce the issue:

  1. Add ignore-article "*" "tags # \"tag1\" and (title =~ \"filtered-phrase\")" to the config file.
  2. Start newsboat.

Expected: articles from feeds tagged "tag1" which contain "filtered-phrase" in the title are filtered out. Actual: all articles in all feeds are filtered out.


Loving newsboat, thanks for your work!

shanewstone avatar Jan 18 '18 21:01 shanewstone

Hi! Thanks for taking the time to report this.

I reproduced this with current master (a2844a747). Debug log shows that you're right and ignore fails because of a missing tags attribute. I'll try to look deeper into it over the weekend.

JFTR: ignore-article "*" "tags # \"tag1\"" is enough to reproduce the issue.

Minoru avatar Jan 19 '18 21:01 Minoru

Okay, so here's what I got so far. The problem turned out to be twofold:

  1. when we load an RSS item from the database and call rss_ignores::matches on it, the item doesn't yet have a pointer to its parent feed. Thus the item can't delegate attribute lookup, and as a result, has no "tags" attribute associated with it.

    This is very easy to fix by adding a call to rss_item::set_feedptr somewhere in rssitem_callback. (But this fix leads to a segfault in a test suite—need to investigate that);

  2. even if the item delegates attribute lookup, "tags" attribute will still be empty since the feed we're operating on doesn't contain anything. In fact, cache knows nothing of tags, and never bothers with them—it's controller who sets them time and again after cache has done its work.

    This is trickier. I'd rather not pass tags to cache::internalize_rssfeed, because it shifts too much logic into the cache class. Perhaps I need a separate class that will consult both cache and urlreader to produce the final rss_feed?

I need to sleep on it, and I also don't know how much free time I'll have during the workweek, so maybe this will have to wait until the weekend again. Thank you for the patience!

Minoru avatar Jan 21 '18 20:01 Minoru

Thanks for looking into this @Minoru. Were you able to make progress on a solution? Solution 1 seems like the logical way to proceed in my opinion, but the segfault is obviously problematic.

shanewstone avatar May 04 '18 20:05 shanewstone

I dropped the ball—sorry. Items 1 and 2 aren't alternatives, they just deal with different parts of the problem. If I deploy only item 1 (provided I solve the segfault problem), nothing will change for the users—items will have pointers to parent feed, but parent feed doesn't have any tags either, so tag lookup will continue to fail.

I hesitate to promise any new deadline, but I'll definitely keep this issue updated with my progress.

Minoru avatar May 06 '18 11:05 Minoru

So, any progress?

Jorenar avatar Feb 22 '20 13:02 Jorenar

@Jorengarenar, nope. I'm busy with other parts of the codebase at the moment, and have no idea when I'll return to this issue.

Minoru avatar Feb 22 '20 13:02 Minoru

Do you know of any post-fetch solutions?

Jorenar avatar Feb 23 '20 03:02 Jorenar

@Jorengarenar I didn't test this, but I believe you can set ignore-mode "display" and add one ignore-article rule per feed.

Minoru avatar Feb 23 '20 13:02 Minoru

No luck with that

Jorenar avatar Feb 23 '20 18:02 Jorenar

If this gets fixed we should update the documentation to mention the feature in section "5.6. Killfiles". It's particular useful if we could support the expression syntax, even just being able to specify multiple feeds or tags, would be a welcome improvement.

allanwind avatar Dec 04 '21 00:12 allanwind

This is trickier. I'd rather not pass tags to cache::internalize_rssfeed, because it shifts too much logic into the cache class. Perhaps I need a separate class that will consult both cache and urlreader to produce the final rss_feed?

What about moving the item-filtering code into Controller?

the-blank-x avatar Mar 18 '22 16:03 the-blank-x

@the-blank-x Controller is a large class, so I'd prefer not to add to it.

Minoru avatar Mar 23 '22 20:03 Minoru

Any progress? I fall into the same situation here.

tapyu avatar Oct 02 '22 14:10 tapyu

If this gets fixed we should update the documentation to mention the feature in section "5.6. Killfiles".

Indeed. Filtering out by tags and title are probably the most common cases. It must be exemplified in the documentation.

tapyu avatar Oct 02 '22 14:10 tapyu

@tapyu Nothing has been done about this yet. PRs welcome ;)

Minoru avatar Oct 03 '22 20:10 Minoru

@Minoru I would love to contibute, but there is a huge barrier since news aggregators and internet protocols are not my expertise area, at all. Is there a newbie-friendly community where I can learn more about it and, who knows, contribute?

tapyu avatar Oct 03 '22 22:10 tapyu

@tapyu If there are such communities, I'm not aware of them. What I did myself is I started working on bugs that I understood how to reproduce, and read up on relevant standards when I ran into protocol-related bugs.

This particular bug is all about Newsboat's internals though, I don't think there's anything one needs to read except for Newsboat's code. Take a look at https://github.com/newsboat/newsboat/issues/101#issuecomment-359278782; the first item there seems doable by a newcomer, and would make it possible to work on the second item.

Minoru avatar Oct 08 '22 11:10 Minoru