bitbot icon indicating copy to clipboard operation
bitbot copied to clipboard

Duplicate RSS feed entries

Open Mikaela opened this issue 6 years ago • 15 comments

  • https://www.vihrealanka.fi/rss

This is the main newspaper of Finnish Greens Party and the latest topic is about them being on holiday, but regardless their latest topics were printed to a channel a few times at night.

Mikaela avatar Jul 08 '19 11:07 Mikaela

  • https://blog.cloudflare.com/rss/

also appears to be a good example of recurreing RSS feed

Mikaela avatar Jul 18 '19 20:07 Mikaela

The only reason I can think that this would happen (if we assume valid rss) is if old entries disappear from the list on one request and then reappear on a later request. We hold a list of "seen" IDs until those IDs are not present in a response.

jesopo avatar Jul 23 '19 17:07 jesopo

The Cloudflare blog feed seems to be getting a bit out of hand. I wonder if it can have anything to do with UTF-8 characters in different languages?

Mikaela avatar Jul 31 '19 08:07 Mikaela

And I gave up trying to follow https://blog.cloudflare.com/rss/

Mikaela avatar Aug 01 '19 11:08 Mikaela

think this might be relevant https://github.com/ProgVal/Limnoria/commit/15fb16a8ae923a745dd242bafa73f39cd2232125

going to ask Val about why that was done and implement something similar if it sounds like the issue we're seeing here.

jesopo avatar Aug 12 '19 12:08 jesopo

the explanation is: if an RSS feed shows n latest entries and then server-side they delete some of those entries, older ones slide in to the "most recent" window. I think that's likely what's happening here.

jesopo avatar Aug 12 '19 13:08 jesopo

hopefully resolved as of jesopo/bitbot@1db3929d

jesopo avatar Aug 12 '19 14:08 jesopo

I am still seeing this with fd0ad283 and the vihrealanka feed.

Edit: as per IRC request I have restored Cloudflare feed to see if it will start repeating.

Mikaela avatar Aug 30 '19 15:08 Mikaela

@Mikaela have you been seeing this issue much any more?

jesopo avatar Oct 14 '19 14:10 jesopo

Apparently my bot isn't even connected to the network where I see this, I will need to try updating and adding the Cloudflare feed later today as I am a bit busy at the moment.

Mikaela avatar Oct 15 '19 08:10 Mikaela

2019-10-15T13:55:13+0300 <@Mikaela> !rss add https://blog.cloudflare.com/rss/
2019-10-15T13:55:14+0300 bittibotti (Notice) [RSS] That URL is already being watched

So I guess it may be resolved, but I will reopen if that turns out to not be the case and as I have observed the Vihreä Lanka more.

Mikaela avatar Oct 15 '19 10:10 Mikaela

After upgrading Tornado to https://github.com/jesopo/bitbot/issues/184#issuecomment-546301526 (after a week of brokeness) this morning, I have gotten over 200 messages in one group and by scrolling though them, Vihreä Lanka seems to be repeating as I saw some of their titles three times in my scrolling.

However the Greens party has decided to discontinue Vihreä Lanka ( https://www.vihrealanka.fi/juttu/vihre%C3%A4t-lakkauttaa-vihre%C3%A4-lanka-lehden ), so I am not sure if this specific case is worth investigating, but if Vihreä Lanka feed repeats itself, I imagine it's likely that someone sometime is going to hit this issue with another feed.

Mikaela avatar Oct 31 '19 09:10 Mikaela

was it repeating a frequently or over a long period of time?

jesopo avatar Oct 31 '19 10:10 jesopo

Since I reopened this issue, https://www.vihrealanka.fi/juttu/miten-tavallinen-ihminen-voi-vaikuttaa-esimerkiksi-l%C3%A4htem%C3%A4ll%C3%A4-ehdolle-vaaleihin-sanovat (and some others after it, this entry is just the first of several recurring entries) has been posted at the following times (UTC+2):

  • 12.14
  • 12.34
  • 13.29
  • 14.34

I have observed these times through a Telegram relay bot as it had the best read marker.

Mikaela avatar Oct 31 '19 12:10 Mikaela

thanks! I think I'll set up something to monitor this specific feed for a while and see if I can find anything that would cause this.

jesopo avatar Oct 31 '19 12:10 jesopo