httrack icon indicating copy to clipboard operation
httrack copied to clipboard

"Get non-HTML files related to a link" bug

Open tristanleboss opened this issue 8 years ago • 2 comments

Hello,

There is a problem with this option.

Indeed, if you try to download a website with a link to a dropbox file, WinHTTrack will try to download the whole dropbox.com website... the same problem arise with en.wikipedia.org.

Super easy way to reproduce the bug: put this page online (or try to capture http://www.crealya.fr/winhttrack/) and try to capture it with the "Get non-HTML files related to a link" option enabled.

<html>
 	<head></head>
 	<body>
 		<a href="https://www.dropbox.com/s/zgjdj1kgk24fx88/distorted.jpg">Bug WinHTTrack</a>
 	</body>
 </html>

tristanleboss avatar Feb 14 '17 18:02 tristanleboss

The only thing thats reliable is to execlude everything and include the domains that you want to mirror: -* +*domaintomirror.tld*

gj12 avatar Mar 03 '17 06:03 gj12

@gj12 Unfortunately, you don't know before doing the capture if the site you are capturing contains like to "buggy" website like the dropbox.com, wikipedia.com, ... there has to be a bug in WinHTTrack because it should not do that.

tristanleboss avatar Feb 20 '18 01:02 tristanleboss