cloudflare-scrape icon indicating copy to clipboard operation
cloudflare-scrape copied to clipboard

ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Open 00abCoder opened this issue 5 years ago • 8 comments
trafficstars

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

  • [ x] I've upgraded cfscrape with pip install -U cfscrape
  • [ x] I'm using Node version 10 or higher
  • [ x] The site protection I'm having issues with is from Cloudflare
  • [ x] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 2.7.12

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: [email protected]
License: UNKNOWN
Location: /usr/local/lib/python2.7/dist-packages
Requires: requests

Code snippet involved with the issue

import cfscrape
url = "https://techblog.willshouse.com/2012/01/03/most-common-user-agents"
scraper = cfscrape.create_scraper()
content = scraper.get(url).content

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 204, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/usr/local/lib/python2.7/dist-packages/cfscrape/__init__.py", line 292, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

URL of the Cloudflare-protected page

https://techblog.willshouse.com/2012/01/03/most-common-user-agents

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

00abCoder avatar May 02 '20 12:05 00abCoder

Changing line 250 of init.py to this solves the problem: challenge, ms = re.search( r"setTimeout(function\s*(\s*){\s*(var " r"\ss,\st,\so,\sp,\sb,\sr,\se,\sa,\sk,\si,\sn,\sg,\sf.+?\r?\n[\s\S]+?a.value\s=.+?)\r?\n" r"(?:[^{<>]},\s(\d{4,}))?", javascript, flags=re.S ).groups()

00abCoder avatar May 02 '20 12:05 00abCoder

Great works , thank you so much. please Tell me, is it necessary to withstand a pause of 5 seconds between requests?

Dimitrenko avatar May 02 '20 19:05 Dimitrenko

Seems it is not necessary, I run the following code and it's returning the same content on all of them:

import cfscrape
url = "https://techblog.willshouse.com/2012/01/03/most-common-user-agents"
scraper = cfscrape.create_scraper()
contents = []
for i in range(5):
	content = scraper.get(url).content
	contents.append(content)

00abCoder avatar May 04 '20 16:05 00abCoder

is it necessary to withstand a pause of 5 seconds between requests?

that might depend on the site and how much you request

lord8266 avatar May 05 '20 01:05 lord8266

Changing line 250 of init.py to this solves the problem: challenge, ms = re.search( r"setTimeout(function\s*(\s*){\s*(var " r"\s_s,\s_t,\s_o,\s_p,\s_b,\s_r,\s_e,\s_a,\s_k,\s_i,\s_n,\s_g,\s_f.+?\r?\n[\s\S]+?a.value\s_=.+?)\r?\n" r"(?:[^{<>]},\s(\d{4,}))?", javascript, flags=re.S ).groups()

@00abCoder @Anorov Thanks a lot, it's useful, so I pull a request to master branch : https://github.com/Anorov/cloudflare-scrape/pull/360

BruceLee569 avatar May 08 '20 08:05 BruceLee569

Same problem again ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script. challenge, ms = re.search( r"setTimeout(function\s*(\s*){\s*(var " r"\s_s,\s_t,\s_o,\s_p,\s_b,\s_r,\s_e,\s_a,\s_k,\s_i,\s_n,\s_g,\s_f.+?\r?\n[\s\S]+?a.value\s_=.+?)\r?\n" r"(?:[^{<>]},\s(\d{4,}))?", javascript, flags=re.S ).groups()

does not work any more

Dimitrenko avatar May 21 '20 12:05 Dimitrenko

I'm facing the same problem. nothing seems to be working

iZooGooD avatar Apr 25 '21 12:04 iZooGooD

This project is abandoned, and the lib had broken. See #406

SpangleLabs avatar Apr 30 '21 14:04 SpangleLabs