cloudflare-scrape icon indicating copy to clipboard operation
cloudflare-scrape copied to clipboard

Hi, I am getting ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script, when using cfscrape to bypass cloudflare protection while scraping data from website.

Open ajay-raikar-upwork opened this issue 5 years ago • 0 comments
trafficstars

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

  • [x] I've upgraded cfscrape with pip install -U cfscrape
  • [x] I'm using Node version 10 or higher
  • [x] The site protection I'm having issues with is from Cloudflare
  • [x] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.7.4

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1 

Code snippet involved with the issue

"""This module contains the ``CloudFlareMiddleware``"""

from cfscrape import get_tokens

import logging


class CloudFlareMiddleware:
    """Scrapy middleware to bypass the CloudFlare's anti-bot protection"""

    @staticmethod
    def is_cloudflare_challenge(response):
        """Test if the given response contains the cloudflare's anti-bot protection"""

        return (
            response.status == 503 or response.status == 429
            and response.headers.get('Server', '').startswith(b'cloudflare')
            and 'jschl_vc' in response.text
            and 'jschl_answer' in response.text
        )

    def process_response(self, request, response, spider):
        """Handle the a Scrapy response"""

        if not self.is_cloudflare_challenge(response):
            return response

        logger = logging.getLogger('cloudflaremiddleware')

        logger.debug(
            'Cloudflare protection detected on %s, trying to bypass...',
            response.url
        )

        cloudflare_tokens, __ = get_tokens(
            request.url,
            user_agent=spider.settings.get('USER_AGENT')
        )

        logger.debug(
            'Successfully bypassed the protection for %s, re-scheduling the request',
            response.url
        )

        request.cookies.update(cloudflare_tokens)
        request.priority = 99999

        return request

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

2020-05-14 02:11:47 [scrapy.core.scraper] ERROR: Error downloading <GET https://coinmarketcap.com/currencies/decentraland/>
Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1362, in returnValue
    raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <429 https://coinmarketcap.com/currencies/decentraland/>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 255, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 53, in process_response
    response = yield method(request=request, response=response, spider=spider)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy_cloudflare_middleware\middlewares.py", line 37, in process_response
    user_agent=spider.settings.get('USER_AGENT')
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 383, in get_tokens
    resp = scraper.get(url, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\requests\sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

URL of the Cloudflare-protected page

[https://coinmarketcap.com]

URL of Pastebin/Gist with HTML source of protected page

[]

ajay-raikar-upwork avatar May 13 '20 09:05 ajay-raikar-upwork