polipus icon indicating copy to clipboard operation
polipus copied to clipboard

Edit regular expression in charge of removing anchor, simply add 'colon'

Open ABrisset opened this issue 9 years ago • 0 comments

I found that urls containing anchors like "#sku:123" (e.g a semi-colon) were not cleaned up when passed to the to_absolute method . As a consequence, they were escaped and added to the queue of the crawler, which led to 404 errors. This kind of bug is related to the issue I described here.

To fix it, this commit adds a colon in the regular expression used to remove anchor from urls in the to_absolute method.

ABrisset avatar Jul 20 '15 16:07 ABrisset