patreon-scraper
patreon-scraper copied to clipboard
CAPTCHA prompt seems to prevent any content from downloading
When attempting to run this script no content is downloaded, when looking at the script output in the console the body section appears to contain the markup for a CAPTCHA page leading me to believe that this is the root of the issue.
Example:
`\n
\n<!--[if IE 7]>
<html class="no-js ie7 oldie" lang="en-US">
<![endif]-->\n
<!--[if IE 8]>
<html class="no-js ie8 oldie" lang="en-US">
<![endif]-->\n
<!--[if gt IE 8]>
<!-->
<html class="no-js" lang="en-US">
<!--
<![endif]-->\n
<head>\n
<title>Attention Required! | Cloudflare</title>\n
<meta name="captcha-bypass" id="captcha-bypass" />\n
<meta charset="UTF-8" />\n
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n
<meta name="robots" content="noindex, nofollow" />\n
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n
<!--[if lt IE 9]>
<link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" />
<![endif]-->\n
<style type="text/css">body{margin:0;padding:0}</style>\n\n\n
<!--[if gte IE 10]>
<!-->
<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
<!--
<![endif]-->\n
<!--[if gte IE 10]>
<!-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
<!--
<![endif]-->\n\n\n\n\n
</head>\n
<body>\n
<div id="cf-wrapper">\n
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n
<div id="cf-error-details" class="cf-error-details-wrapper">\n
<div class="cf-wrapper cf-header cf-error-overview">\n
<h1 data-translate="challenge_headline">One more step</h1>\n
<h2 class="cf-subheadline">
<span data-translate="complete_sec_check">Please complete the security check to access</span> patreon.com
</h2>\n
</div>
<!-- /.header -->\n \n
<div class="cf-section cf-highlight cf-captcha-container">\n
<div class="cf-wrapper">\n
<div class="cf-columns two">\n
<div class="cf-column">\n \n
<div class="cf-highlight-inverse cf-form-stacked">\n
<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">\n
<input type="hidden" name="s" value="6903d23f35b518d64183514a77da7a1e4080565e-1569455077-1800-AXOxIyPcLIrC1ekaD86GCBANIQ6tvR2dsxDi1Op3XgNC+nMdHTBPnPjig2WUKcdW1YeHIAgbCFko2Pjz4MTZYsMKtDf+imsnyFXsz9HFKWlw8V09/GiYfeyNtj1F9O+3L6EI1/CXKDOujHPfIpMFcF9x7Xs1bqSjoPnunmOUV8FA0M9p2vcX5ZR1at4f0ZwXFw=="></input>\n
<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal" data-ray="51c0ddfb9cfbce6b" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n
<div class="g-recaptcha"></div>\n
<noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n
<div>
<div style="width: 302px">\n
<div>\n
<iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n
</div>\n
<div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n
<input type="submit" value="Submit"></input>\n
</div>\n
</div>
</div>\n
</noscript>\n
</form>\n\n \n
</div>\n
</div>\n\n
<div class="cf-column">\n
<div class="cf-screenshot-container">\n \n
<span class="cf-no-screenshot"></span>\n \n
</div>\n
</div>\n
</div>
<!-- /.columns -->\n
</div>\n
</div>
<!-- /.captcha-container -->\n\n
<div class="cf-section cf-wrapper">\n
<div class="cf-columns two">\n
<div class="cf-column">\n
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n \n
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n
</div>\n\n
<div class="cf-column">\n
<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n \n\n
<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n
<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n \n
</div>\n
</div>\n
</div>
<!-- /.section -->\n \n\n
<div class="cf-error-footer cf-wrapper">\n
<p>\n
<span class="cf-footer-item">Cloudflare Ray ID:
<strong>51c0ddfb9cfbce6b</strong>
</span>\n
<span class="cf-footer-separator">•</span>\n
<span class="cf-footer-item">
<span>Your IP</span>: 51.7.125.220
</span>\n
<span class="cf-footer-separator">•</span>\n
<span class="cf-footer-item">
<span>Performance & security by</span>
<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a>
</span>\n \n
</p>\n
</div>
<!-- /.error-footer -->\n\n\n
</div>
<!-- /#cf-error-details -->\n
</div>
<!-- /#cf-wrapper -->\n\n
<script type="text/javascript">\n window._cf_translation = {};\n \n \n</script>\n\n\n \n
</body>\n
</html>`
Hey, thanks for reporting. I noticed the same behaviour myself, and right now I am thinking about what to do about it.
The way I see it, the options are:
- Try to masquerade as a browser (easy and doable)
- Make user solve the captcha (harder, probably using something like Puppeteer)
- Throttle download? (maybe we are making requests too fast)
Frankly I don't know when I'll have time to work on it but it's on my TODO list.
Hey, this should be resolved now (many thanks to developers of cloudscraper)
Issue appears to be persisting in some capacity, just tried the script now and got countless printouts of variations of the following:
failed to execute request
{ CaptchaError: captcha
at validateResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:259:11)
at onCloudflareResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:222:5)
at onRequestResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:205:5)
at Request.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:149:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request.EventEmitter.emit (domain.js:448:20)
at Request.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\request\request.js:1161:10)
at Request.emit (events.js:198:13)
at Request.EventEmitter.emit (domain.js:448:20)
at Gunzip.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\request\request.js:1083:12)
at Object.onceWrapper (events.js:286:20)
at Gunzip.emit (events.js:203:15)
at Gunzip.EventEmitter.emit (domain.js:448:20)
at endReadableNT (_stream_readable.js:1145:12)
at process._tickCallback (internal/process/next_tick.js:63:19) name: 'CaptchaError', message: 'captcha' }
It never happened to me, but according to the cloudscraper bug tracker (bug report) some people experience same issue. Right now, I don't think there is an easy & fast way to circumvent it. Maybe you can try to renew your IP address, or I can try to add more timeout/define an timeout option.
No luck changing IP I'm afraid, tried connecting to various VPN servers as well as without and the issue persisted consistently.
Same thing here, except i am constantly getting the error from the start. I tried to edit the config object passed to cloudscraper but that didn't help:
public getFile(identifier: FileUrlQS): RequestPromise<TypedResponse<any>> { const requestOptions: OptionsWithUrl = { ...this.requestBase, json: false, qs: identifier, url: "/file", agentOptions:{ciphers: 'ECDHE-ECDSA-AES128-GCM-SHA256'}, proxy: 'https://195.182.22.178', port: 53281 } return cloudscraper(requestOptions) }
I'm having the same issue from the start too. Please contact me, I'm willing to pay a few beers for this ;)
I am also seeing what I think is the same issue:
failed to execute request
{ CaptchaError: captcha
at validateResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:259:11)
at onCloudflareResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:222:5)
at onRequestResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:205:5)
at Request.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:149:7)
at Object.onceWrapper (events.js:286:20)
at Request.emit (events.js:198:13)
at Request.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/request/request.js:1161:10)
at Request.emit (events.js:198:13)
at Gunzip.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/request/request.js:1083:12)
at Object.onceWrapper (events.js:286:20)
at Gunzip.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1145:12)
at process._tickCallback (internal/process/next_tick.js:63:19) name: 'CaptchaError', message: 'captcha' }
This is repeated on the console forever.
Invocation command:
./index.ts -s "MYSESSIONID" -o downloaded
Sorry for a long hiatus. I am aware of these problems, but currently I am unable to do anything about them.
I am using a 3rd party cloudflare scraping library, and thus depend on the authors to fix the code according to the changes the Cloudflare periodically pushes. I would like to update the library that is used, but I simply lack time to dive into the Cloudflare anti-bot implementation.