URLs with # in them get prematurely cut off - Mistaken as Fragments
Describe the bug As mentioned here https://github.com/KoalaBear84/OpenDirectoryDownloader/pull/114#issuecomment-1157464715 the issue occurs on URLs with # and it gets cut off.
I traced this issue to DirectoryParser.cs and specifically the CleanFragments function.
The CleanFragments function believes that the URL provided has a URI fragment rather than a legitimate file hence being cut off. This is most likely due to some URL decode and manipulation further up before it hits this CleanFragments function. The fix would probably be to make sure any non URI fragments are %23 encoded before hitting this function
For anyone else hit by this issue as a workaround if you are not likely going to hit a URI Fragment when scraping you can comment out the following section of the code in the CheckParsedResults function in the file DirectoryParser.cs :
if (webDirectory.Uri.Scheme != Constants.UriScheme.Ftp && webDirectory.Uri.Scheme != Constants.UriScheme.Ftps)
{
//CleanFragments(webDirectory);
}
To Reproduce Steps to reproduce the behavior: See here for examples of this https://github.com/KoalaBear84/OpenDirectoryDownloader/pull/114#issuecomment-1157464715
Expected behavior Instead of http://mrclancy.ca/Film%20and%20TV/Movies/MST%20Clips/He-Man%20and%20the%20Masters%20of%20the%20Universe%20-%20
It should be https://mrclancy.ca/Film%20and%20TV/Movies/MST%20Clips/He-Man%20and%20the%20Masters%20of%20the%20Universe%20-%20%2330.m4v
Desktop (please complete the following information):
- OS: macOS
- Version: Latest Master Build (v3.5.0.0 + 2 commits)