cloudflare-scrape
cloudflare-scrape copied to clipboard
Hi, I am getting ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script, when using cfscrape to bypass cloudflare protection while scraping data from website.
Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.
Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
- [x] I've upgraded cfscrape with
pip install -U cfscrape - [x] I'm using Node version 10 or higher
- [x] The site protection I'm having issues with is from Cloudflare
- [x] I'm not using Tor, a VPN, or an anonymizing proxy
Python version number
Run python --version and paste the output below:
Python 3.7.4
cfscrape version number
Run pip show cfscrape and paste the output below:
Name: cfscrape
Version: 2.1.1
Code snippet involved with the issue
"""This module contains the ``CloudFlareMiddleware``"""
from cfscrape import get_tokens
import logging
class CloudFlareMiddleware:
"""Scrapy middleware to bypass the CloudFlare's anti-bot protection"""
@staticmethod
def is_cloudflare_challenge(response):
"""Test if the given response contains the cloudflare's anti-bot protection"""
return (
response.status == 503 or response.status == 429
and response.headers.get('Server', '').startswith(b'cloudflare')
and 'jschl_vc' in response.text
and 'jschl_answer' in response.text
)
def process_response(self, request, response, spider):
"""Handle the a Scrapy response"""
if not self.is_cloudflare_challenge(response):
return response
logger = logging.getLogger('cloudflaremiddleware')
logger.debug(
'Cloudflare protection detected on %s, trying to bypass...',
response.url
)
cloudflare_tokens, __ = get_tokens(
request.url,
user_agent=spider.settings.get('USER_AGENT')
)
logger.debug(
'Successfully bypassed the protection for %s, re-scheduling the request',
response.url
)
request.cookies.update(cloudflare_tokens)
request.priority = 99999
return request
Complete exception and traceback
(If the problem doesn't involve an exception being raised, leave this blank)
2020-05-14 02:11:47 [scrapy.core.scraper] ERROR: Error downloading <GET https://coinmarketcap.com/currencies/decentraland/>
Traceback (most recent call last):
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 44, in process_request
defer.returnValue((yield download_func(request=request, spider=spider)))
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <429 https://coinmarketcap.com/currencies/decentraland/>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 255, in solve_challenge
javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 53, in process_response
response = yield method(request=request, response=response, spider=spider)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy_cloudflare_middleware\middlewares.py", line 37, in process_response
user_agent=spider.settings.get('USER_AGENT')
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 383, in get_tokens
resp = scraper.get(url, **kwargs)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\requests\sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 129, in request
resp = self.solve_cf_challenge(resp, **kwargs)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
answer, delay = self.solve_challenge(body, domain)
File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 292, in solve_challenge
% BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.
URL of the Cloudflare-protected page
[https://coinmarketcap.com]
URL of Pastebin/Gist with HTML source of protected page
[]