RadioDownloader icon indicating copy to clipboard operation
RadioDownloader copied to clipboard

Podcast enclosure URLs are unencoded before being downloaded

Open ribbons opened this issue 7 years ago • 1 comments

Now that #226 is fixed, another URL encoding issue has been discovered by @cjpcjpindre: Podcast enclosure URLs have URL encoded characters replaced by literal ones, which causes an issue if the server is expecting a URL encoded characters.

An example original enclosure URL from the feed https://anchor.fm/s/7368c04/podcast/rss is:

https://anchor.fm/s/7368c04/podcast/play/1722642/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a

ribbons avatar Nov 19 '18 20:11 ribbons

I'm really struggling with this one. The URL unencoding is done when it is passed to the .NET framework Uri class (which can't be avoided when using the WebClient for downloads). This means that the URL above will be changed into the following:

https://anchor.fm/s/7368c04/podcast/play/1722642/https://d3ctxlq1ktw2nl.cloudfront.net/staging/2018-10-13/John-Kearns-782f1f393cd0f.m4a

After some digging, it looks like this behaviour is partially fixed in the .NET framework 4.5 and the same behaviour can be enabled in .NET 2.0 via some slightly nasty reflection (courtesy of the code at https://mikehadlow.blogspot.com/2011/08/how-to-stop-systemuri-un-escaping.html), but this doesn't prevent the colon from being unescaped, so the URL ends up as:

https://anchor.fm/s/7368c04/podcast/play/1722642/https:%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a

This unfortunately still causes a 404 error to be returned from anchor.fm.

Suggestions appreciated!

ribbons avatar Nov 27 '18 20:11 ribbons