news
news copied to clipboard
Some RSS feeds block access based on user-agent
IMPORTANT
Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)
- [x] I have read the CONTRIBUTING.md and followed the provided tips
- [x] I accept that the issue will be closed without comment if I do not check here
- [x] I accept that the issue will be closed without comment if I do not fill out all items in the issue template.
Explain the Problem
Certain RSS feeds fail to load through Nextcloud News, but work perfectly well from the console or browser. Example; https://nationalpost.com/category/news/feed.xml -- error 403 forbidden from nextcloud news, but loads correctly using wget/firefox/etc.
Steps to Reproduce
- Open Nextcloud News
- Click "+ Subscribe"
- Add feed URL such as https://nationalpost.com/category/news/feed.xml
- Press "Subscribe"
System Information
These details are not relevant since the specific issue has been detected, see below.
Issue details
I have determined that the feed in question is actually blocking access based on the User-Agent being used to make the query.
$ wget --user-agent="NextCloud-News/1.0" https://nationalpost.com/category/news/feed.xml
--2023-01-05 09:37:26-- https://nationalpost.com/category/news/feed.xml
Resolving nationalpost.com (nationalpost.com)... 34.111.249.109
Connecting to nationalpost.com (nationalpost.com)|34.111.249.109|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-01-05 09:37:26 ERROR 403: Forbidden.
$ wget https://nationalpost.com/category/news/feed.xml
--2023-01-05 09:39:03-- https://nationalpost.com/category/news/feed.xml
Resolving nationalpost.com (nationalpost.com)... 34.111.249.109
Connecting to nationalpost.com (nationalpost.com)|34.111.249.109|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8408 (8.2K) [application/rss+xml]
Saving to: ‘feed.xml’
feed.xml 100%[===================>] 8.21K --.-KB/s in 0.001s
2023-01-05 09:39:03 (9.29 MB/s) - ‘feed.xml’ saved [8408/8408]
The only difference between the failure and success is the changed User-Agent. Similarly, changing all the strings "NextCloud-News/1.0" in the source code to "NotCloud-News/1.0" solves the problem and allows the feed to be retrieved.
The rationale for the feed source to work in this manner is quite simple; 10,000 servers with that user-agent pulling the feed may appear to be a DDOS attack.
SOLUTION
NextCloud News needs to be able to use alternative user-agents. Either automatically pick something that would be unique, like the server's domain name, or add a field to settings to set a custom user agent.
NextCloud News needs to be able to use alternative user-agents. Either automatically pick something that would be unique, like the server's domain name, or add a field to settings to set a custom user agent.
No, if RSS feeds feel the need to block news they have a reason for that. I'm not starting an arms race with RSS authors.
NextCloud News needs to be able to use alternative user-agents. Either automatically pick something that would be unique, like the server's domain name, or add a field to settings to set a custom user agent.
No, if RSS feeds feel the need to block news they have a reason for that. I'm not starting an arms race with RSS authors.
Because they don't realize that these are self-hosted instances of Nextcloud. Its not about a war or arms race. Its about clearly differentiating each instance of Nextcloud News from all the others.
But that's not what user agents are. User agents are meant to identify the software doing the request. Everyone with the same Google Chrome version has the same user agent and that doesn't cause any problems.
You're obviously right about that, but that doesn't mean that these media companies are SMART.
So educate them about why they should not block this useragent.
That is an impossibility, which you are well aware of. The only viable option here is to alter the program, and since there is no justification to NOT adjust the program, that is where the change should be.
The justification is this: Nextcloud news should not try and circumvent restrictions by misrepresenting the user agent. It's the wrong use of a user agent and your suggestion would allow for extensive fingerprinting of users. All for a temporary benefit until the authors of these feeds find a better regex to block Nextcloud.
The user/administrator should always have control over things like user agent.
I've looked through the documentation online and they all seem to agree that whatever you make your connection with should set it. Do you have some documentation to support your claim that the user should always control this?
I'm having the same problem. For me, it presents when I try to add many WordPress-generated feeds. If I rinse the feeds through a service like RSS-proxy they import.
I thought about this problem for a long time. @SMillerDev (as a developer) and @lbdroid (as a user) are both right There are probably maniacal admins of RSS feeds, but you want to read them! In any case, the final word remains with the developer. But I would like to change User Agent in the settings.