nitter
nitter copied to clipboard
GUID in RSS feed differs between public instances, causing duplicates
To avoid duplicates when switching between public instances (for example when using the 3rd party load balancing services in the wiki) it would perhaps be better if they all had the same GUID for the same tweet. ~Using the original twitter.com URL would be a reasonable choice.~
Using the original twitter.com URL would be a reasonable choice.
I like this idea and thought of almost the exact same approach because every single tweet is guaranteed to have a unique ID made up of a pretty long string of numbers. I'm thinking that GUIDs should be calculated based on that ID alone, which located in the URL between status/
and any tracking parameters that may appear after it beginning with a question mark (?
). For example, the ID of this tweet is 1443946032350613506.
Unlike what I previously suggested also I suggest using only the tweet ID for GUID. The main reason for not using the full original twitter.com URL is to avoid duplicates when the user changes the account name.
https://twitter.com/i/status/[tweet_id]
would consistently work even if the account name changes.
This is easy, but the main problem is any change to the GUID format will cause everyone's RSS clients to suddenly think there are 20 new tweets. I can't think of any solution to this other than only applying it to tweets newer than a certain timestamp for transition period, but I don't know if that's even a good idea. Any feedback is very welcome, I know people complained a lot last time I changed the GUIDs.
Looking into tweet IDs , it looks like they're roughly sortable. So another option would be to determine a tweet ID a significant period in the future, and switch over to just using the tweet ID for tweet IDs greater than that.
~~Or just make it a configuration option, and let instances decide what representation they want, I guess~~ of course this won't solve the issue of inconsistent GUIDs between instances...
So another option would be to determine a tweet ID a significant period in the future, and switch over to just using the tweet ID for tweet IDs greater than that.
A soft transition maybe? Start doing this, then at some point in the future switch to doing Twitter URLs/IDs exclusively.
@zedeus here's how you could switch over without disrupting existing feeds:
- If a param
?global=1
is appended to the rss URL it will use a unified GUID (like the original Twitter link). - Update the RSS link in the Nitter software to add this param to the default displayed at the top of the UI.
- Retain the original behaviour at the existing link.
With nitter.net becoming increasingly popular and apparently struggling under heavy load, solving this issue seems important to me to make switching to other instances as easy as possible.
If we switch the GUID to the native Twitter ID, I think there would be little reason to change the GUID format again, which would maybe justify the inconvenience of users getting duplicate entries in their feeds again.
Just came across this issue, but in case it helps anyone: I wrote nitter-rss-proxy a while back to round-robin between public instances and rewrite GUIDs.
Just came across this issue, but in case it helps anyone: I wrote nitter-rss-proxy a while back to round-robin between public instances and rewrite GUIDs.
This is awesome, will you support dockers image? thanks
This is awesome, will you support dockers image? thanks
My Docker knowledge is rusty/nonexistent at this point, but if you file an issue at https://github.com/derat/nitter-rss-proxy/issues, I'll try to look into it at some point. The proxy requires some runtime configuration (e.g. the list of public instances to talk to).