js-crawler
js-crawler copied to clipboard
Web crawler for Node.JS
Hi, I am trying to setup a basic crawler script but I am getting an error: ` new Crawler().configure({depth: 3}) ^ TypeError: Crawler is not a function at Object. (/var/www/user/test2.js:3:1)...
Is it possible to force crawler to stop its crawling. I have condition that only 500 pages should be crawled when that condition is met ti want to stop this...
If we try to crawl websites that is WP then link gets stuck after crawling few links and nothing happens after that, crawler just gets stalled. Can you suggest me...
I found when crawling a site with the depth set to 2, it will finish, and console.log(crawledUrls) correctly. But when using a higher depth like 4 or 6 (which of...
Dear developers, I am crafting a tool that let me automatically crawl a few sites. However, they are protected by a username and password (that I have). Which is the...
It stops working in some urls for no reason, even without any non-standard configuration. domains he stops: paraleloiluminacao.com.br tcengenhariaeletrica.com.br kplojista.com.br bsgrafo.com.br
Would be awesome to have the already visited urls saveable so that you can restart a crawl later and not revisit links, to start where you left off.
I am trying to crawl a big website (arezzo.com.br), however, after ~1700 URLs crawled, it simply stopped. No errors printed, and also the `finished` callback wasn't called. data:image/s3,"s3://crabby-images/68229/68229939cf4182f5efc5b0b752b6b916a00a64ac" alt="image" Can someone...
I found wrong charset from the response content from non-utf8 web page. Here's a url for example: http://www.cartoomad.com/comic/276400012051002.html
Was wondering if it is possible to crawl the web for just images using this package. If possible, please how?