cloudflare-scrape
cloudflare-scrape copied to clipboard
cloudflare
Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.
Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
- [ ] I've upgraded cfscrape with
pip install -U cfscrape - [ ] I'm using Node version 10 or higher
- [ ] The site protection I'm having issues with is from Cloudflare
- [ ] I'm not using Tor, a VPN, or an anonymizing proxy
Python version number
Run python --version and paste the output below:
cfscrape version number
Run pip show cfscrape and paste the output below:
Code snippet involved with the issue
2020-06-16 18:42:03 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping)
2020-06-16 18:42:03 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.7 (default, May 6 2020, 04:59:01) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Darwin-19.5.0-x86_64-i386-64bit
2020-06-16 18:42:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scraping', 'CONCURRENT_REQUESTS': 32, 'CONCURRENT_REQUESTS_PER_DOMAIN': 32, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 2, 'DOWNLOAD_TIMEOUT': 600, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'FEED_FORMAT': 'csv', 'FEED_URI': 'results/%(name)s_%(time)s.csv', 'HTTPCACHE_ENABLED': True, 'HTTPCACHE_EXPIRATION_SECS': 43200, 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage', 'NEWSPIDER_MODULE': 'scraping.spiders', 'SPIDER_MODULES': ['scraping.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}
2020-06-16 18:42:03 [scrapy.extensions.telnet] INFO: Telnet Password: e179fe629b29425b
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
>>>>>>>>>>>>>>>>>__init__(MODES)<<<<<<<<<<<<<<<<<
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy_crawlera.CrawleraMiddleware',
'scrapy_splash.SplashCookiesMiddleware',
'scrapy_splash.SplashMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy_splash.SplashDeduplicateArgsMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled item pipelines:
['scraping.pipelines.ScrapingPipeline']
2020-06-16 18:42:03 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.modes.com:443
2020-06-16 18:42:04 [urllib3.connectionpool] DEBUG: https://www.modes.com:443 "GET /jp/shopping/woman HTTP/1.1" 503 None
Unhandled error in Deferred:
2020-06-16 18:42:04 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last):
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 172, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 176, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 81, in crawl
start_requests = iter(self.spider.start_requests())
File "/Users/rnrnstar/github/Spiders/scraping/spiders/modes.py", line 41, in start_requests
data = scraper.get("https://www.modes.com/jp/shopping/woman").content
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 129, in request
resp = self.solve_cf_challenge(resp, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 207, in solve_cf_challenge
answer, delay = self.solve_challenge(body, domain)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 299, in solve_challenge
% BUG_REPORT
builtins.ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.
Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
2020-06-16 18:42:04 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 259, in solve_challenge
javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 81, in crawl
start_requests = iter(self.spider.start_requests())
File "/Users/rnrnstar/github/Spiders/scraping/spiders/modes.py", line 41, in start_requests
data = scraper.get("https://www.modes.com/jp/shopping/woman").content
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 129, in request
resp = self.solve_cf_challenge(resp, **kwargs)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 207, in solve_cf_challenge
answer, delay = self.solve_challenge(body, domain)
File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 299, in solve_challenge
% BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.
Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
Complete exception and traceback
(If the problem doesn't involve an exception being raised, leave this blank)
URL of the Cloudflare-protected page
[LINK GOES HERE]
URL of Pastebin/Gist with HTML source of protected page
[LINK GOES HERE]
Try this #373 I tested it with your link and it worked