headless-chrome-crawler
headless-chrome-crawler copied to clipboard
How to handle the timeout error?
What is the current behavior?
Getting unhandled exception:
{ Error: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/.../node_modules/puppeteer/lib/NavigatorWatcher.js:73:21)
options:
{ maxDepth: 1,
priority: 0,
delay: 0,
retryCount: 1,
retryDelay: 10000,
timeout: 30000,
jQuery: true,
browserCache: false,
skipDuplicates: true,
depthPriority: true,
obeyRobotsTxt: true,
followSitemapXml: false,
skipRequestedRedirect: false,
cookies: null,
screenshot: null,
viewport: null,
evaluatePage:
"(() => {\n return {\n title: $('title').text()\n };\n })()",
url: 'https://foo.bar/' },
depth: 1,
previousUrl: null }
Despite configured onError
handler:
const crawl = async (startUrl: string) => {
log.debug('beginning to crawl %s', startUrl);
const crawler = await launchCrawler({
browserCache: false,
evaluatePage: () => {
return {
title: $('title').text()
};
},
headless: true,
onError: (error) => {
console.log(error);
},
onSuccess: (result) => {
console.log(result);
},
retryCount: 1
});
crawler.queue(startUrl);
await crawler.onIdle();
await crawler.close();
};
What is the expected behavior?
I am expecting that the onError
handler would catch the timeout error.
What is the motivation / use case for changing the behavior?
Otherwise, there is no documented way to handle a timeout.
Please tell us about your environment:
- Version: ^1.8.0
- Platform / OS version: OSX
- Node.js version: v11.3.0
This happens when some of the resources on the initial page timeout, e.g. CSS.
There should be an option permitting ignoring loading of those resources.
@gajus thanks for the suggestion! Good point! Do you consider creating a PR?