patreon-scraper icon indicating copy to clipboard operation
patreon-scraper copied to clipboard

CAPTCHA prompt seems to prevent any content from downloading

Open LexCybermac opened this issue 5 years ago • 9 comments

When attempting to run this script no content is downloaded, when looking at the script output in the console the body section appears to contain the markup for a CAPTCHA page leading me to believe that this is the root of the issue.

Example:

`\n

\n
<!--[if IE 7]>
<html class="no-js ie7 oldie" lang="en-US">
	<![endif]-->\n
	<!--[if IE 8]>
	<html class="no-js ie8 oldie" lang="en-US">
		<![endif]-->\n
		<!--[if gt IE 8]>
		<!-->
		<html class="no-js" lang="en-US">
			<!--
			<![endif]-->\n
			<head>\n
				<title>Attention Required! | Cloudflare</title>\n
				<meta name="captcha-bypass" id="captcha-bypass" />\n
				<meta charset="UTF-8" />\n
				<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n
				<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n
				<meta name="robots" content="noindex, nofollow" />\n
				<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n
				<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n
				<!--[if lt IE 9]>
				<link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" />
				<![endif]-->\n
				<style type="text/css">body{margin:0;padding:0}</style>\n\n\n
				<!--[if gte IE 10]>
				<!-->
				<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
				<!--
				<![endif]-->\n
				<!--[if gte IE 10]>
				<!-->
				<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
				<!--
				<![endif]-->\n\n\n\n\n
			</head>\n
			<body>\n  
				<div id="cf-wrapper">\n    
					<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n    
					<div id="cf-error-details" class="cf-error-details-wrapper">\n      
						<div class="cf-wrapper cf-header cf-error-overview">\n        
							<h1 data-translate="challenge_headline">One more step</h1>\n        
							<h2 class="cf-subheadline">
								<span data-translate="complete_sec_check">Please complete the security check to access</span> patreon.com
							</h2>\n      
						</div>
						<!-- /.header -->\n      \n      
						<div class="cf-section cf-highlight cf-captcha-container">\n        
							<div class="cf-wrapper">\n          
								<div class="cf-columns two">\n            
									<div class="cf-column">\n            \n              
										<div class="cf-highlight-inverse cf-form-stacked">\n                
											<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">\n  
												<input type="hidden" name="s" value="6903d23f35b518d64183514a77da7a1e4080565e-1569455077-1800-AXOxIyPcLIrC1ekaD86GCBANIQ6tvR2dsxDi1Op3XgNC+nMdHTBPnPjig2WUKcdW1YeHIAgbCFko2Pjz4MTZYsMKtDf+imsnyFXsz9HFKWlw8V09/GiYfeyNtj1F9O+3L6EI1/CXKDOujHPfIpMFcF9x7Xs1bqSjoPnunmOUV8FA0M9p2vcX5ZR1at4f0ZwXFw=="></input>\n  
												<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal"  data-ray="51c0ddfb9cfbce6b" async data-sitekey="6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0"></script>\n  
												<div class="g-recaptcha"></div>\n  
												<noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n    
													<div>
														<div style="width: 302px">\n      
															<div>\n        
																<iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfBixYUAAAAABhdHynFUIMA_sa4s-XsJvnjtgB0" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n      
															</div>\n      
															<div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n        
																<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n        
																<input type="submit" value="Submit"></input>\n      
															</div>\n    
														</div>
													</div>\n  
												</noscript>\n
											</form>\n\n                \n              
										</div>\n            
									</div>\n\n            
									<div class="cf-column">\n              
										<div class="cf-screenshot-container">\n              \n                
											<span class="cf-no-screenshot"></span>\n              \n              
										</div>\n            
									</div>\n          
								</div>
								<!-- /.columns -->\n        
							</div>\n      
						</div>
						<!-- /.captcha-container -->\n\n      
						<div class="cf-section cf-wrapper">\n        
							<div class="cf-columns two">\n          
								<div class="cf-column">\n            
									<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n            \n            
									<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n          
								</div>\n\n          
								<div class="cf-column">\n            
									<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n            \n\n            
									<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n            
									<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n            \n          
								</div>\n        
							</div>\n      
						</div>
						<!-- /.section -->\n      \n\n      
						<div class="cf-error-footer cf-wrapper">\n  
							<p>\n    
								<span class="cf-footer-item">Cloudflare Ray ID: 
									<strong>51c0ddfb9cfbce6b</strong>
								</span>\n    
								<span class="cf-footer-separator">&bull;</span>\n    
								<span class="cf-footer-item">
									<span>Your IP</span>: 51.7.125.220
								</span>\n    
								<span class="cf-footer-separator">&bull;</span>\n    
								<span class="cf-footer-item">
									<span>Performance &amp; security by</span>
									<a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a>
								</span>\n    \n  
							</p>\n
						</div>
						<!-- /.error-footer -->\n\n\n    
					</div>
					<!-- /#cf-error-details -->\n  
				</div>
				<!-- /#cf-wrapper -->\n\n  
				<script type="text/javascript">\n  window._cf_translation = {};\n  \n  \n</script>\n\n\n  \n
			</body>\n
		</html>`

LexCybermac avatar Sep 25 '19 23:09 LexCybermac

Hey, thanks for reporting. I noticed the same behaviour myself, and right now I am thinking about what to do about it.

The way I see it, the options are:

  • Try to masquerade as a browser (easy and doable)
  • Make user solve the captcha (harder, probably using something like Puppeteer)
  • Throttle download? (maybe we are making requests too fast)

Frankly I don't know when I'll have time to work on it but it's on my TODO list.

lavovaLampa avatar Oct 03 '19 10:10 lavovaLampa

Hey, this should be resolved now (many thanks to developers of cloudscraper)

lavovaLampa avatar Jan 31 '20 02:01 lavovaLampa

Issue appears to be persisting in some capacity, just tried the script now and got countless printouts of variations of the following:

failed to execute request
{ CaptchaError: captcha
    at validateResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:259:11)
    at onCloudflareResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:222:5)
    at onRequestResponse (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:205:5)
    at Request.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\cloudscraper\index.js:149:7)
    at Object.onceWrapper (events.js:286:20)
    at Request.emit (events.js:198:13)
    at Request.EventEmitter.emit (domain.js:448:20)
    at Request.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\request\request.js:1161:10)
    at Request.emit (events.js:198:13)
    at Request.EventEmitter.emit (domain.js:448:20)
    at Gunzip.<anonymous> (C:\Users\Lex\PatreonScraper\patreon-scraper\node_modules\request\request.js:1083:12)
    at Object.onceWrapper (events.js:286:20)
    at Gunzip.emit (events.js:203:15)
    at Gunzip.EventEmitter.emit (domain.js:448:20)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) name: 'CaptchaError', message: 'captcha' }

LexCybermac avatar Feb 01 '20 14:02 LexCybermac

It never happened to me, but according to the cloudscraper bug tracker (bug report) some people experience same issue. Right now, I don't think there is an easy & fast way to circumvent it. Maybe you can try to renew your IP address, or I can try to add more timeout/define an timeout option.

lavovaLampa avatar Feb 02 '20 17:02 lavovaLampa

No luck changing IP I'm afraid, tried connecting to various VPN servers as well as without and the issue persisted consistently.

LexCybermac avatar Feb 06 '20 22:02 LexCybermac

Same thing here, except i am constantly getting the error from the start. I tried to edit the config object passed to cloudscraper but that didn't help:

public getFile(identifier: FileUrlQS): RequestPromise<TypedResponse<any>> { const requestOptions: OptionsWithUrl = { ...this.requestBase, json: false, qs: identifier, url: "/file", agentOptions:{ciphers: 'ECDHE-ECDSA-AES128-GCM-SHA256'}, proxy: 'https://195.182.22.178', port: 53281 } return cloudscraper(requestOptions) } Screenshot 2020-02-12 at 11 16 18 AM

NourDT avatar Feb 12 '20 03:02 NourDT

I'm having the same issue from the start too. Please contact me, I'm willing to pay a few beers for this ;)

lucasff avatar Feb 12 '20 12:02 lucasff

I am also seeing what I think is the same issue:

failed to execute request
{ CaptchaError: captcha
    at validateResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:259:11)
    at onCloudflareResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:222:5)
    at onRequestResponse (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:205:5)
    at Request.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/cloudscraper/index.js:149:7)
    at Object.onceWrapper (events.js:286:20)
    at Request.emit (events.js:198:13)
    at Request.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:198:13)
    at Gunzip.<anonymous> (/home/amoe/vcs/patreon-scraper/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:286:20)
    at Gunzip.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) name: 'CaptchaError', message: 'captcha' }

This is repeated on the console forever. Invocation command: ./index.ts -s "MYSESSIONID" -o downloaded

amoe avatar May 10 '20 12:05 amoe

Sorry for a long hiatus. I am aware of these problems, but currently I am unable to do anything about them.

I am using a 3rd party cloudflare scraping library, and thus depend on the authors to fix the code according to the changes the Cloudflare periodically pushes. I would like to update the library that is used, but I simply lack time to dive into the Cloudflare anti-bot implementation.

lavovaLampa avatar Oct 26 '20 22:10 lavovaLampa