[BUG] FreshRSS duplicating entries in very rare, mysterious circumstances
Describe the bug I created a bug about this problem (#4831) thinking it was related to me upgrading to PgSQL 15. Since then though, I've experienced the same issue a couple of more times without making any changes to my database server.
What happens is, at some point, one individual article in a feed becomes stuck as unread. An error is thrown when I attempt to mark it as read:
[error] --- SQL error markRead 1: ERROR: duplicate key value violates unique constraint "frss_[my username]_entry_id_feed_guid_key"
DETAIL: Key (id_feed, guid)=(12, http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411) already exists.
It's very infrequent when it occurs, and honestly, with PgAdmin it only takes about 10 minutes for me to remember how I solved the problem last time and manually fix it in the affected database. I'm just curious why this problem is happening in the first place and if there's a simple fix to prevent it in the future.
To Reproduce No way I know to reliable reproduce it. So far, I've only seen it occur in the DSLreports feed.
Expected behavior Hopefully there's some simple steps that can be taken to prevent this error from happening.
Environment information (please complete the following information):
- Device: Any (server-side error)
- OS: Docker
- FreshRSS version: 1.21.0
- Database version: PostgreSQL 15.0 (Debian 15.0-1.pgdg110+1)
Additional info
When I run the following query, I see the results below.
SELECT * FROM public.frss_[user]_entry
WHERE guid LIKE '%143411'
Results
"id" "guid" "title" "author" "content" "link" "date" "lastSeen" "hash" "is_read" "is_favorite" "id_feed" "tags" "attributes"
1683637463772928 "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" "Dish Network Takes $30M Cyberattack Hit; + more notable news -" "(truncated)" "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" 1683632220 1684242266 "binary data" 1 0 12 "{""enclosures"":[]}"
1684155865369868 "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" "Dish Network Takes $30M Cyberattack Hit; + more notable news -" "(truncated)" "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" 1683632220 1684155862 "binary data" 0 0 12 "{""enclosures"":[]}"
So they have the same guid (URL) but different date and lastSeen times, one has the is_read flag set, and the same tags. Perhaps the article is updated and their feed is republishing it? Perhaps there's a way to account for this.
This just happened again with the exact same source & article. Here's the data:
"id" "guid" "title" "author" "content" "link" "date" "lastSeen" "hash" "is_read" "is_favorite" "id_feed" "tags" "attributes"
1684328661379775 "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" "Dish Network Takes $30M Cyberattack Hit; + more notable news -" "(truncated)" "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" 1683632220 1684506869 "binary data" 1 0 12 "{""enclosures"":[]}"
1684415064468015 "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" "Dish Network Takes $30M Cyberattack Hit; + more notable news -" "(truncated)" "http://www.dslreports.com/shownews/Dish-Network-Takes-30M-Cyberattack-Hit-more-notable-news-143411" 1683632220 1684415063 "binary data" 0 0 12 "{""enclosures"":[]}"
Looks like the same situation as last. I think FreshRSS is creating a second entry when it shouldn't, but I'm not sure what's triggering it.
I can confirm, it creates for me as well, and seems daily for a few feeds. I get duplicates almost daily. But not always upon updates. Many a times same one with same date time stamp as well. I will try and keep a track for example perspectives, next time.
I've seen what I think might be the same issue too. I've only seen it on the 9to5mac.com feed. I ran their feed through the W3C validator and it found:
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations. [line 276](https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2F9to5mac.com%2Ffeed#l276), column 28: guid should not be blank (2 occurrences) [[help](https://validator.w3.org/feed/docs/warning/NotBlank.html)] <guid isPermaLink="false"></guid>
I don't suppose the issue could be related to bad permalinks in some feeds?
Reproduced this error today moving my postgres container from postgres:15 to postgres:15-alpine.
[error] --- SQL error markRead 1: ERROR: duplicate key value violates unique constraint "<USER>_entry_id_feed_guid_key"
DETAIL: Key (id_feed, guid)=(<ID>, <URL>) already exists.
Rolled back the image to postgres:15 and the error went away.
Edit to clarify: Behaviour is the same as OP: Mostly everything seemed to work fine except one entry would get stuck unread in the feed. In my case, it was an item I had marked read a day ago, before moving postgres versions.
If some of you are still around, could you please try the following commands for a bit of debugging?
SELECT datname, datcollate, datctype FROM pg_database WHERE datname = 'freshrss';
SELECT table_catalog, table_name, column_name, collation_name FROM information_schema.columns WHERE table_catalog = 'freshrss' AND column_name = 'guid';
SELECT conname, contype from pg_constraint WHERE conname LIKE '%guid%';
Not sure if my situation is helpful, but happily. Let me know if there are any other queries I can run for you.
bob was the user that received the duplicate entries and, at the time, was the only user on the instance.
SELECT datname, datcollate, datctype FROM pg_database WHERE datname = 'freshrss';
| datname | datcollate | datctype |
|---|---|---|
| freshrss | en_US.utf8 | en_US.utf8 |
SELECT table_catalog, table_name, column_name, collation_name FROM information_schema.columns WHERE table_catalog = 'freshrss' AND column_name = 'guid';
| table_catalog | table_name | column_name | collation_name |
|---|---|---|---|
| freshrss | alice_entrytmp | guid | null |
| freshrss | alice_entry | guid | null |
| freshrss | bob_entrytmp | guid | null |
| freshrss | bob_entry | guid | null |
SELECT conname, contype from pg_constraint WHERE conname LIKE '%guid%';
| conname | contype |
|---|---|
| bob_entry_id_feed_guid_key | u |
| bob_entrytmp_id_feed_guid_key | u |
| alice_entry_id_feed_guid_key | u |
| alice_entrytmp_id_feed_guid_key | u |
Any news here?