httrack "Get non-HTML files related to a link" bug

Hello,

There is a problem with this option.

Indeed, if you try to download a website with a link to a dropbox file, WinHTTrack will try to download the whole dropbox.com website... the same problem arise with en.wikipedia.org.

Super easy way to reproduce the bug: put this page online (or try to capture http://www.crealya.fr/winhttrack/) and try to capture it with the "Get non-HTML files related to a link" option enabled.

<html>
 	<head></head>
 	<body>
 		<a href="https://www.dropbox.com/s/zgjdj1kgk24fx88/distorted.jpg">Bug WinHTTrack</a>
 	</body>
 </html>

Feb 14 '17 18:02 tristanleboss

The only thing thats reliable is to execlude everything and include the domains that you want to mirror: -* +*domaintomirror.tld*

Mar 03 '17 06:03 gj12

@gj12 Unfortunately, you don't know before doing the capture if the site you are capturing contains like to "buggy" website like the dropbox.com, wikipedia.com, ... there has to be a bug in WinHTTrack because it should not do that.

Feb 20 '18 01:02 tristanleboss