tracker-radar-collector
tracker-radar-collector copied to clipboard
`page.setViewport` causes the browser to disconnect in mobile emulation
The -m, --mobile
option seems to be causing tracker-radar-collector
to fail during page load:
$ npm run crawl -- -u "https://duck.com" -o /tmp/ -v -f -d "requests" --mobile
gives me:
Start time: Wed, 10 Mar 2021 10:58:18 GMT
Number of urls to crawl: 1
Number of crawlers: 1
Processing entry #1 (https://duck.com).
duck.com: requests init took 0.000s
duck.com: page context initiated in 0.002s
duck.com: Crawl failed net::ERR_ABORTED at https://duck.com/ Error: net::ERR_ABORTED at https://duck.com/
at navigate (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async FrameManager.navigateFrame (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
at async getSiteData (tracker-radar-collector/crawler.js:184:9)
duck.com: ⚠️ unmatched failed response [object Object]
duck.com: requests init took 0.000s
duck.com: page context initiated in 0.001s
duck.com: Crawl failed net::ERR_ABORTED at https://duck.com/ Error: net::ERR_ABORTED at https://duck.com/
at navigate (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async FrameManager.navigateFrame (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
at async getSiteData (tracker-radar-collector/crawler.js:184:9)
duck.com: ⚠️ unmatched failed response [object Object]
Max number of retries (2) exceeded for "https://duck.com".
✅ Finished successfully.
Finish time: Wed, 10 Mar 2021 10:58:18 GMT
Sucessful crawls: 0/1 (0.00%)
Failed crawls: 1/1 (100.00%)
The same crawl without the --mobile
option runs just fine:
$ npm run crawl -- -u "https://duck.com" -o /tmp/ -v -f -d "requests"
Start time: Wed, 10 Mar 2021 10:58:38 GMT
Number of urls to crawl: 1
Number of crawlers: 1
Processing entry #1 (https://duck.com).
duck.com: requests init took 0.000s
duck.com: page context initiated in 0.001s
duck.com: getting requests data took 0.032s
Processing "https://duck.com" took 5.047s.
✅ Finished successfully.
Finish time: Wed, 10 Mar 2021 10:58:43 GMT
Sucessful crawls: 1/1 (100.00%)
Failed crawls: 0/1 (0.00%)
I and @asumansenol could reliably reproduce this error on a few different machines using the latest from the main
branch.
The same error in a parallel crawl (e.g. with c=4
) includes some error log about browser being disconnected.
Processing entry #3 (http://youtube.com).
facebook.com: page context initiated in 0.010s
facebook.com: Crawl failed net::ERR_ABORTED at http://facebook.com/ Error: net::ERR_ABORTED at http://facebook.com/
at navigate (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async FrameManager.navigateFrame (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
at async getSiteData (tracker-radar-collector/crawler.js:184:9)
facebook.com: ⚠️ unmatched failed response {
requestId: 'C127D1C33C7C90FDB931CAFFCCAB1F85',
timestamp: 10151.696633,
type: 'Document',
errorText: 'net::ERR_ABORTED',
canceled: true
}
(node:19010) UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected!
at tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:51:147
at tracker-radar-collector/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:62
at Array.map (<anonymous>)
at Object.emit (tracker-radar-collector/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:43)
at CDPSession.emit (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/EventEmitter.js:72:22)
at CDPSession._onClosed (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:247:14)
at Connection._onMessage (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:94:25)
at WebSocket.<anonymous> (tracker-radar-collector/node_modules/puppeteer/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:13:32)
at WebSocket.onMessage (tracker-radar-collector/node_modules/ws/lib/event-target.js:132:16)
at WebSocket.emit (events.js:315:20)
(node:19010) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 4)
Since the page.setViewport
call is one of the main differences between the desktop and the mobile crawl, I commented that line out and rerun a mobile crawl. I didn't get any errors!
Let me know if you need any other information from me to help you solve the issue.
Another data point: the problem goes away if I comment out isMobile
and hasTouch
from MOBILE_VIEWPORT
, while keeping the page.setViewport
call.
Hey Gunes, thanks for the report! I gave it a quick look and it seems like an upstream issue (chromium/puppeteer) to me. The workaround is to set all mobile options on browser launch:
function openBrowser(log, proxyHost) {
const args = {
defaultViewport: MOBILE_VIEWPORT
};
and comment out
// page.setViewport(emulateMobile ? MOBILE_VIEWPORT : DEFAULT_VIEWPORT);
I'll land a proper fix at some point, but please use the workaround for now.
BTW Congrats on https://arxiv.org/pdf/2102.09301.pdf , well done 👏 Please feel to reach out to me directly (konrad at duckduckgo.com) if you'll have any thoughts about the crawler or would like to use Tracker Radar data in your research (we are crawling over 150k pages on regular basis and can adjust the crawler to collect more data if needed).
@kdzwinel Thanks so much for promptly addressing this. It makes sense that this is an upstream issue.
BTW Congrats on https://arxiv.org/pdf/2102.09301.pdf , well done clap
Thank you! Much of the credit goes to @ydimova. For the record, our experience using tracker-radar-collector
for the study was just great. I especially appreciated how easy it is to add new instrumentation, since your method based on Runtime.evaluate
is so generic (and novel). Also the tool is super easy to start with, and was quite stable handling tens of thousands of sites without any hiccups. I am certain that tracker-radar-collector
will be a popular tool (along with OpenWPM) within the research community not long from now.
Please feel to reach out to me directly (konrad at duckduckgo.com) if you'll have any thoughts about the crawler or would like to use Tracker Radar data in your research (we are crawling over 150k pages on regular basis and can adjust the crawler to collect more data if needed).
I'll be more than happy to reach out. We have other projects that are based on tracker-radar-collector
and I think it'd be useful to keep a channel open.