rss2email icon indicating copy to clipboard operation
rss2email copied to clipboard

Error while processing a feed

Open StayPirate opened this issue 3 years ago • 12 comments

Disclaimer: I'm not 100% sure if the following is an actual bug.

While fetching the darknetdiaries feed I get the following error:

[...]
rss2email-rss2email-1  | Fetching feed: https://feeds.megaphone.fm/darknetdiaries
rss2email-rss2email-1  |        Feed contains 121 entries
rss2email-rss2email-1  |                New entry in feed: Ep 49: Elliot
rss2email-rss2email-1  |
rss2email-rss2email-1  | Fetching feed: https://guerredirete.substack.com/feed
rss2email-rss2email-1  |        Feed contains 20 entries
rss2email-rss2email-1  |        20 entries already seen
rss2email-rss2email-1  |        00 entries not seen before
[...]
rss2email-rss2email-1  | 2022-07-13.13:24:46: sleeping for 15 minutes.
rss2email-rss2email-1  | error processing https://feeds.megaphone.fm/darknetdiaries - key required

What I find weird is that I can access the same feed from the browser without being asked about any required key, and all the items are available.

Moreover, the New entry in feed: Ep 49: Elliot is fetched and sent as email notification at each rss2email iteration (every 15m in my case).

StayPirate avatar Jul 13 '22 13:07 StayPirate

I suspect the issue here is that the feed-fetching is failing to return something that can be parsed.

You say that fetching form the browser works, but I'd guess that fetching via wget or curl would fail. In the past we've seen that with remote sites caring about the user-agent or similar things. You can see a hint of that in #83.

You might try copying and pasting a decent user-agent into the config-file, something like this:

https://feeds.megaphone.fm/darknetdiaries
  - user-agent: Mozilla....

And seeing if that changes things?

skx avatar Jul 18 '22 07:07 skx

I tested the following without any success

https://feeds.megaphone.fm/darknetdiaries
- template: /templates/default.tmpl
- user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0

StayPirate avatar Jul 19 '22 15:07 StayPirate

I'm sorry to hear, I tried adding the feed to my own list and it works just fine.

I tested with curl, with wget, and with/without cookies. For me each time I try it things work as expected.

The only thing I can think of now is to add some debugging. Edit httpfetch.go and change this code:

	// Parse it
	fp := gofeed.NewParser()
	feed, err2 := fp.ParseString(h.content)
	if err2 != nil {
		return nil, fmt.Errorf("error parsing %s contents: %s", h.url, err2.Error())
	}

To read:

	// Parse it
	fp := gofeed.NewParser()
	feed, err2 := fp.ParseString(h.content)
	if err2 != nil {
		return nil, fmt.Errorf("error parsing %s contents: %s %s", h.url, err2.Error(), h.content)
	}

(i.e. add %s to the end of the first argument to fmt.Errorf and then add h.content to the argument list)

The intent there is:

  • Once fetching the contents over HTTP
  • We try to parse the feed.
  • If that fails then report not just the failure, but the actual content that was returned.
    • It might be the remote host transiently returns "HTTP 500", or a page "You're fetching too fast", or something else I've not yet seen.

skx avatar Jul 21 '22 05:07 skx

I finally found some time to test your suggestion at https://github.com/StayPirate/rss2email/commit/00a778512a45ea5f806077fc7285d8cc36e2ba3f, better late than never.

Unfortunatelly nothing changed from the rss2email output, I still get the following:

rss2email_security-feed  | Fetching feed: https://darknetdiaries.com/feedfree.xml                                                                                                                                                                                                                                                                                                    [65/1762]
rss2email_security-feed  |      Feed contains 128 entries
rss2email_security-feed  |              New entry in feed: Ep 128: Gollumfun Part 1
rss2email_security-feed  |
rss2email_security-feed  | Fetching feed: https://guerredirete.substack.com/feed
rss2email_security-feed  |      Feed contains 20 entries
rss2email_security-feed  |      20 entries already seen
rss2email_security-feed  |      00 entries not seen before
[...]
rss2email_security-feed  | 2022-11-15.19:03:49: sleeping for 15 minutes.
rss2email_security-feed  | error processing https://darknetdiaries.com/feedfree.xml - key required

As you can see I also tried to use a different URL for the same feed: https://darknetdiaries.com/feedfree.xml instead of https://feeds.megaphone.fm/darknetdiaries.

The only extra information I can provide is that these feeds are supposed to be privates, but as you already tested they can be fetched without any authentication. I don't even know how the authentication is handled. For other private feeds I saw the form https://www.patreon.com/rss/darknetdiaries?auth=<token>, but that's not the case for the ones I'm testing.

The podcast author also tried to take down some leaked private addresses from https://github.com/gpodder/mygpo/issues/308#issuecomment-626996691.

StayPirate avatar Nov 15 '22 19:11 StayPirate

I noticed that a bucket named https://darknetdiaries.com/feedfree.xml is created in the state.db file, but no entry is added.

Attachment: state.db.zip

StayPirate avatar Nov 15 '22 20:11 StayPirate

I never managed to fix this problem and now there is a new feed which behave the same way. Could you please try it and let me know if you can reproduce it? The feed is https://feeds.simplecast.com/gnxGfXH0.

Here is what I see from the rss2email output:

[...]
rss2email_security-feed  | error processing https://darknetdiaries.com/feedfree.xml - key required
rss2email_security-feed  | error processing https://feeds.simplecast.com/gnxGfXH0 - key required
rss2email_security-feed  | 2023-01-19.07:43:08: sleeping for 15 minutes.

From the MUA PoV, I get spammed by the last available item of that feed every time rss2email ran. image

I tested the same feed with another feed-reader and it works fine.

StayPirate avatar Jan 19 '23 08:01 StayPirate

As you can see both feeds which "require a key" are actually publicly accessible. I'm not sure about for what a key is required. Can you help me to understand if this problem is part of rss2email or mmcdole/gofeed?

In any case, I think that the experienced bad-behavior of receiving a email for the newer item of the feed every 15m or (more precisely) every time rss2email runs, can be solved in rss2email by adding the item to the state.db before handling whatever turns into the error processing ... - key required.

After all rss2email sent the email, so for consistency it also should add that processed item to the state.db, and only after reacts to any potential error. I mean, either it not send the email and not add the item to the state.db, or if it sends the email then it should also add the processed item to state.db.

StayPirate avatar Jan 24 '23 09:01 StayPirate

I did some code-review and I think that this is a bug related to rss2email.

First of all, as shown here

https://github.com/skx/rss2email/blob/faaf1358f61127e27fa26d3021f6db971c0213ba/processor/processor.go#L300-L302

when a new item is found, both the item.Title and the item.Link are printed. This can be confirmed from the output of a succesfully processed feed, like:

rss2email_security-feed  | Fetching feed: https://developer.apple.com/news/releases/rss/releases.rss
rss2email_security-feed  |      Feed contains 33 entries
rss2email_security-feed  |              New entry in feed: iOS 16.3 (20D47)
rss2email_security-feed  |                      https://developer.apple.com/news/releases/?id=01232023f
rss2email_security-feed  |              New entry in feed: iPadOS 16.3 (20D47)
rss2email_security-feed  |                      https://developer.apple.com/news/releases/?id=01232023e

Anyway, this is not true for the two feeds for which I'm experiencing the issue, as you can see from the following output item.Link is missing. Maybe that's the reason why the state.db file cannot be update accordingly (set the item as read)?

rss2email_security-feed  | Fetching feed: https://decoded.avast.io/feed/
rss2email_security-feed  |      Feed contains 10 entries
rss2email_security-feed  |      10 entries already seen
rss2email_security-feed  |      00 entries not seen before
rss2email_security-feed  | Fetching feed: https://feeds.simplecast.com/gnxGfXH0
rss2email_security-feed  |      Feed contains 78 entries
rss2email_security-feed  |              New entry in feed: Tod and Jen and Jennifer on Season 5 of Security Nation
rss2email_security-feed  |
rss2email_security-feed  | Fetching feed: https://googleprojectzero.blogspot.com/feeds/posts/default
rss2email_security-feed  |      Feed contains 25 entries
rss2email_security-feed  |      25 entries already seen
rss2email_security-feed  |      00 entries not seen before
rss2email_security-feed  | Fetching feed: https://blog.torproject.org/rss.xml
rss2email_security-feed  |      Feed contains 10 entries
rss2email_security-feed  |      10 entries already seen
rss2email_security-feed  |      00 entries not seen before
rss2email_security-feed  | Fetching feed: https://darknetdiaries.com/feedfree.xml
rss2email_security-feed  |      Feed contains 131 entries
rss2email_security-feed  |              New entry in feed: Ep 131: Welcome To Video
rss2email_security-feed  |
rss2email_security-feed  | Fetching feed: https://guerredirete.substack.com/feed
rss2email_security-feed  |      Feed contains 20 entries
rss2email_security-feed  |      20 entries already seen
rss2email_security-feed  |      00 entries not seen before

Not having item.Link set may create problems later in the code. For instance, it is used to mark the item as seen in the related feed-bucket:

https://github.com/skx/rss2email/blob/faaf1358f61127e27fa26d3021f6db971c0213ba/processor/processor.go#L343-L351

Maybe the key required in the error message error processing https://darknetdiaries.com/feedfree.xml - key required is coming from the bbolt database, as it has a key=value approach? And I mistakenly considered the key a token to access private rss feeds? @skx what do you think?

StayPirate avatar Jan 24 '23 10:01 StayPirate

gentle ping

StayPirate avatar Feb 10 '23 15:02 StayPirate

Sorry I kinda dropped the ball on this one - but I guess we should come back to it if it's still happening.

skx avatar Jun 05 '24 05:06 skx

Sure, I'll keep on eye on my logs about that. On that regards, what is the best way to stop logging WARN messages, but still keep error logs?

StayPirate avatar Jun 05 '24 09:06 StayPirate

Warn is hardcoded as the default level, sadly:

        // Setup our default logging level
        //
        loggerLevel = &slog.LevelVar{}
        loggerLevel.Set(slog.LevelWarn)

        //
        // Allow showing "all the logs"
        //
        if os.Getenv("LOG_ALL") != "" {
                loggerLevel.Set(slog.LevelDebug)
        }

I guess we'd need a new "LOG_NONE" or "LOG_FATAL" to only show Error notices. I guess I assumed warnings were usually worth seeing. I'll add a change to that once I've merged the frequency-PR (probably do that tomorrow. I think I've addressed all your comments and it's been running well for me for the past few days.)

skx avatar Jun 08 '24 19:06 skx