dilbert-rss
dilbert-rss copied to clipboard
Use https for guids, links and image sources
The dilbert.com site now uses https (redirecting http requests). The comic link used to determine the URL of today's strip is a full https URL. The nav-left links used to find links to strips from previous days are relative.
The output will therefore contain https guids and links for today's strip, but http guids and links for previous days (see the example below). This results in duplicate items when processed by an RSS reader.
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
<channel>
<title>Dilbert Daily Strip</title>
<link>http://dilbert.com</link>
<description>An unofficial RSS feed for dilbert.com.</description>
<lastBuildDate>Mon, 21 Oct 2019 20:27:40 GMT</lastBuildDate>
<generator>PyRSS2Gen-1.1.0</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<item>
<title>Dilbert comic for October 21, 2019</title>
<link>https://dilbert.com/strip/2019-10-21</link>
<description><a href='https://dilbert.com/strip/2019-10-21'><img src='//assets.amuniversal.com/84f1bdb0c7460137c2df005056a9545d' /></a></description>
<guid isPermaLink="true">https://dilbert.com/strip/2019-10-21</guid>
<pubDate>Mon, 21 Oct 2019 00:00:00 GMT</pubDate>
</item>
<item>
<title>Dilbert comic for October 20, 2019</title>
<link>http://dilbert.com/strip/2019-10-20</link>
<description><a href='http://dilbert.com/strip/2019-10-20'><img src='//assets.amuniversal.com/bf2ac5c0b0d30137ba71005056a9545d' /></a></description>
<guid isPermaLink="true">http://dilbert.com/strip/2019-10-20</guid>
<pubDate>Sun, 20 Oct 2019 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
This pull request changes the script to convert all URLs found to full https URLs (including the protocol-relative URLs used for the images, resolving #7).