headless-chrome-crawler
headless-chrome-crawler copied to clipboard
Distributed crawler powered by Headless Chrome
**What is the current behavior?** No information about current URL in customCrawl() **What is the motivation / use case for changing the behavior?** I'm want to skip request, but add...
Hello, Puppeteer supports proxy but Headless Chrome Crawler doesn't work either. ``` const HCCrawler = require('headless-chrome-crawler'); (async () => { const crawler = await HCCrawler.launch({ args: ['--ignore-certificate-errors', '--proxy-server=127.0.0.1:8080', '--no-sandbox' ],...
I'm crawling a small site with maxDepth === 2, and things crawl fine. As soon as up it to 3 or more, the the crawler hangs. I don't see onError...
**What is the current behavior?** When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't...
**What is the current behavior?** Using a Redis cache for the queue and a cluster of processes crawling, the crawler is repeating requests. **If the current behavior is a bug,...
**What is the current behavior?** Duplicated urls are not skipped. The same url is crawled twice. **If the current behavior is a bug, please provide the steps to reproduce** ```...
I want to make my customcrawl click on elements. They dont have a href, but a js onclick event. Is this possible, and how and where in the code can...
**What is the current behavior?** No documented way of scrolling **What is the expected behavior?** Being able to scroll **What is the motivation / use case for changing the behavior?**...
**What is the current behavior?** `page.$$()` method just returns an "**JSHandle@node**" string instead of a **ElementHandle** object. **If the current behavior is a bug, please provide the steps to reproduce**...
For the domain "test.domain.com" result.response.url includes urls from "domain.com", too. I tried it with the subdomain name and regexp. I don't understand, why, shouldn't "allowedDomains" parameter prevent scanning from URLs...