cloudflare-scrape Hi, I am getting ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script, when using cfscrape to bypass cloudflare protection while scraping data from website.

Hi, I am getting ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script, when using cfscrape to bypass cloudflare protection while scraping data from website.

Open ajay-raikar-upwork opened this issue 5 years ago • 0 comments

trafficstars

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

[x] I've upgraded cfscrape with pip install -U cfscrape
[x] I'm using Node version 10 or higher
[x] The site protection I'm having issues with is from Cloudflare
[x] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.7.4

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1

Code snippet involved with the issue

"""This module contains the ``CloudFlareMiddleware``"""

from cfscrape import get_tokens

import logging


class CloudFlareMiddleware:
    """Scrapy middleware to bypass the CloudFlare's anti-bot protection"""

    @staticmethod
    def is_cloudflare_challenge(response):
        """Test if the given response contains the cloudflare's anti-bot protection"""

        return (
            response.status == 503 or response.status == 429
            and response.headers.get('Server', '').startswith(b'cloudflare')
            and 'jschl_vc' in response.text
            and 'jschl_answer' in response.text
        )

    def process_response(self, request, response, spider):
        """Handle the a Scrapy response"""

        if not self.is_cloudflare_challenge(response):
            return response

        logger = logging.getLogger('cloudflaremiddleware')

        logger.debug(
            'Cloudflare protection detected on %s, trying to bypass...',
            response.url
        )

        cloudflare_tokens, __ = get_tokens(
            request.url,
            user_agent=spider.settings.get('USER_AGENT')
        )

        logger.debug(
            'Successfully bypassed the protection for %s, re-scheduling the request',
            response.url
        )

        request.cookies.update(cloudflare_tokens)
        request.priority = 99999

        return request

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

2020-05-14 02:11:47 [scrapy.core.scraper] ERROR: Error downloading <GET https://coinmarketcap.com/currencies/decentraland/>
Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1362, in returnValue
    raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <429 https://coinmarketcap.com/currencies/decentraland/>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 255, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 53, in process_response
    response = yield method(request=request, response=response, spider=spider)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\scrapy_cloudflare_middleware\middlewares.py", line 37, in process_response
    user_agent=spider.settings.get('USER_AGENT')
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 383, in get_tokens
    resp = scraper.get(url, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\requests\sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "c:\users\d33ps3curity\appdata\local\programs\python\python37-32\lib\site-packages\cfscrape\__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

URL of the Cloudflare-protected page

[https://coinmarketcap.com]

URL of Pastebin/Gist with HTML source of protected page

[]

May 13 '20 09:05 ajay-raikar-upwork

cloudflare-scrape cloudflare-scrape copied to clipboard

Hi, I am getting ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script, when using cfscrape to bypass cloudflare protection while scraping data from website.

Python version number

cfscrape version number

Code snippet involved with the issue

Complete exception and traceback

URL of the Cloudflare-protected page

URL of Pastebin/Gist with HTML source of protected page

cloudflare-scrape
cloudflare-scrape copied to clipboard