bitbot
bitbot copied to clipboard
Duplicate RSS feed entries
- https://www.vihrealanka.fi/rss
This is the main newspaper of Finnish Greens Party and the latest topic is about them being on holiday, but regardless their latest topics were printed to a channel a few times at night.
- https://blog.cloudflare.com/rss/
also appears to be a good example of recurreing RSS feed
The only reason I can think that this would happen (if we assume valid rss) is if old entries disappear from the list on one request and then reappear on a later request. We hold a list of "seen" IDs until those IDs are not present in a response.
The Cloudflare blog feed seems to be getting a bit out of hand. I wonder if it can have anything to do with UTF-8 characters in different languages?
And I gave up trying to follow https://blog.cloudflare.com/rss/
think this might be relevant https://github.com/ProgVal/Limnoria/commit/15fb16a8ae923a745dd242bafa73f39cd2232125
going to ask Val about why that was done and implement something similar if it sounds like the issue we're seeing here.
the explanation is: if an RSS feed shows n latest entries and then server-side they delete some of those entries, older ones slide in to the "most recent" window. I think that's likely what's happening here.
hopefully resolved as of jesopo/bitbot@1db3929d
I am still seeing this with fd0ad283 and the vihrealanka feed.
Edit: as per IRC request I have restored Cloudflare feed to see if it will start repeating.
@Mikaela have you been seeing this issue much any more?
Apparently my bot isn't even connected to the network where I see this, I will need to try updating and adding the Cloudflare feed later today as I am a bit busy at the moment.
2019-10-15T13:55:13+0300 <@Mikaela> !rss add https://blog.cloudflare.com/rss/
2019-10-15T13:55:14+0300 bittibotti (Notice) [RSS] That URL is already being watched
So I guess it may be resolved, but I will reopen if that turns out to not be the case and as I have observed the Vihreä Lanka more.
After upgrading Tornado to https://github.com/jesopo/bitbot/issues/184#issuecomment-546301526 (after a week of brokeness) this morning, I have gotten over 200 messages in one group and by scrolling though them, Vihreä Lanka seems to be repeating as I saw some of their titles three times in my scrolling.
However the Greens party has decided to discontinue Vihreä Lanka ( https://www.vihrealanka.fi/juttu/vihre%C3%A4t-lakkauttaa-vihre%C3%A4-lanka-lehden ), so I am not sure if this specific case is worth investigating, but if Vihreä Lanka feed repeats itself, I imagine it's likely that someone sometime is going to hit this issue with another feed.
was it repeating a frequently or over a long period of time?
Since I reopened this issue, https://www.vihrealanka.fi/juttu/miten-tavallinen-ihminen-voi-vaikuttaa-esimerkiksi-l%C3%A4htem%C3%A4ll%C3%A4-ehdolle-vaaleihin-sanovat (and some others after it, this entry is just the first of several recurring entries) has been posted at the following times (UTC+2):
- 12.14
- 12.34
- 13.29
- 14.34
I have observed these times through a Telegram relay bot as it had the best read marker.
thanks! I think I'll set up something to monitor this specific feed for a while and see if I can find anything that would cause this.