js-crawler
js-crawler copied to clipboard
How to deal with shortened URLs
Hi,
is there a way to retrieve the landing url of a shortened url like goo.gl/89234fIASVHAS ? Right now the crawler will pass the shortened url into the callback, which messes up all relative links on the crawled pages... Thanks!
From a quick look it seems like bit.ly uses the status 301 "Moved permanently" and goo.gl 307 "Internal redirect" will need to investigate the case of URL shorteners a bit more.
thanks for your reply, do you have any advice on how to work around it for now?
Right now the crawler will pass the shortened url into the callback
I fixed this part, added a unit test and published a new version of the crawler 0.3.19
However I could not reproduce the original issue when passing the first url into the onSuccess callback would cause problems with relative urls:
which messes up all relative links on the crawled pages...
Please, let me know if the problem has been fixed with the recent changes.