Feed discovery does not work with relative URLs in links
IMPORTANT
Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)
- [x] I have read the CONTRIBUTING.md and followed the provided tips
- [x] I accept that the issue will be closed without comment if I do not check here
- [x] I accept that the issue will be closed without comment if I do not fill out all items in the issue template.
Explain the Problem
When trying to add a blog to the News reader, I was unable to do so, News repeatedly claims the hostname was not found.
The (first) problem I found is that during the discovery phase, <link> element’s href attributes are used as written which does not work for relative URLs (allowed by the spec).
Steps to Reproduce
Explain what you did to encounter the issue
- Try to add a new feed: https://k47.cz/
- An error appears: cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
The problem is the k47.cz page links to its feed using a relative URL <link rel=alternate type=application/rss+xml href=rss.xml title="RSS zdroj"> which is then not resolved and News just attempts to fetch an “URL” of http://rss.xml.
It might be argued this is an upstream bug; feed-io’s Explorer might resolve the relative URIs itself. Hard to tell, there is no specification of its expected behavior, AFAICT.
I was able to fix the problem by resolving relative URLs after discovery:
Patch fixing the problem
--- FeedServiceV2.php.bak 2021-05-28 07:48:45.524385111 +0000
+++ FeedServiceV2.php 2021-05-28 07:58:19.287691101 +0000
@@ -16,6 +16,7 @@
use FeedIo\Explorer;
use FeedIo\Reader\ReadErrorException;
use HTMLPurifier;
+use Net_URL2;
use OCA\News\Db\FeedMapperV2;
use OCA\News\Fetcher\FeedFetcher;
@@ -199,7 +200,13 @@
if ($full_discover) {
$feeds = $this->explorer->discover($feedUrl);
if ($feeds !== []) {
- $feedUrl = array_shift($feeds);
+ $discoveredUrl = array_shift($feeds);
+ $url2 = new Net_URL2($discoveredUrl);
+ if ($url2->isAbsolute()) {
+ $feedUrl = $discoveredUrl;
+ } else {
+ $feedUrl = strval((new Net_URL2($feedUrl))->resolve($discoveredUrl));
+ }
}
}
System Information
- News app version: 15.4.5
- Nextcloud version: 20.0.9
- Cron type: Cron running on systemd timer
- PHP version: 7.4.18
- Database and version: mysql 10.5.10
- Browser and version: Firefox 88.0
- OS and version: Arch Linux/4.14.232
Contents of nextcloud/data/nextcloud.log
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Json","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Atom","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rss","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rdf","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"discover feeds from https://k47.cz/","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"read access : rss.xml into a feed instance (feed class : FeedIo\\Feed)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"start reading rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"no 'modifiedSince' parameter given, setting it to 01/01/1970","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"hitting rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":2,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"rss.xml read error : cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fk47.cz%2Frss.xml even the self reference is broken. I'd just recommend alerting author of this issue.
Yes, that was the other problem I hit; I have already contacted the author about that. However, this issue is not caused by the broken self-link in the feed.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
What does it mean “no recent activity”? Should I keep commenting that yes, this is still broken?
It means that nobody has time or motivation to do something about it. So at some point it'll be closed automatically unless someone fixes it before then.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Solved here: https://github.com/alexdebril/feed-io/pull/422
I made another PR fix, but it hasn't been approved yet: https://github.com/alexdebril/feed-io/pull/431
Yea I saw that thanks, it seems like the maintenance activity on feed-io has gone down quite a bit, unfortunately...
The same is true for absolute paths, for example on https://notapplicable.dev.
Yeah, an “absolute path” (/somewhere/something) is still a “relative URI” (resolved relatively to the base URI). :-)
[Also, note that this was (automatically) closed as either “stale” or “completed” [!], but it is very much not so, still the same problem as always, and “You do not have permissions to reopen this issue”, even though that would not help anyway, it would get closed automatically as stale again, I guess. Ceterum censeo https://nostalebots.xyz/ etc.]
Well I have the power and despite there being an upstream PR, it is neither merged nor fixed in the News app itself.
So regarding this issue unless someone here takes over the maintenance of feed-io I see no way to fix this.
Taking over maintenance would mean to fork the repository, check all the open PRs and merge them if needed.
Then to setup the release and publishing procedure because news would still want to pull this via composer and other projects probably too.
I have been thinking about doing this myself but didn't have the motivation or time to do that.
Being maintainer of a project does not mean that you have to do all stuff like coding but to respond to questions if possible and to review PRs and stuff.
Before going as far as forking the project, I've created an issue at the feed-io project to ask what the current maintenance status is. Maybe the lack of activity is a temporary thing?
What version of feed-io https://github.com/php-feed-io/feed-io (5 or still a transition to 6) is planned to be used in news?
At some point version 6+ it's just a matter of missing time and effort.
This seems to be fixed with the new beta release 26.1.0-beta.1 since it includes a new version of feed-io 5.3 that includes the fixes for relative links.
Feedback welcome 😊