Urls that redirect gets ignored by htmlLinkParser

Open tkjaergaard opened this issue 5 years ago • 0 comments

When specifing a hostname to restict to like "www.acme.com", and a path like: "www.acme.com/foo" return a 301 the location is added to the queue without validation that it has the correct hostname.

Maybe a "hook" should be implemented here: https://github.com/brendonboshell/supercrawler/blob/master/lib/Crawler.js#L192

Allowing the htmlLinkParser to intercept and ignore the upsert.

Apr 08 '20 18:04 tkjaergaard