cloudflare-scrape
cloudflare-scrape copied to clipboard
cfscrape not working on this particular site
Before creating an issue, first upgrade cfscrape with pip install -U cfscrape
and see if you're still experiencing the problem. Please also confirm your Node version (node --version
or nodejs --version
) is version 10 or higher.
Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
- [x] I've upgraded cfscrape with
pip install -U cfscrape
- [x] I'm using Node version 10 or higher
- [x] The site protection I'm having issues with is from Cloudflare
- [x] I'm not using Tor, a VPN, or an anonymizing proxy
Python version number
Run python --version
and paste the output below:
Python 3.8.3
cfscrape version number
Run pip show cfscrape
and paste the output below:
Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape fo
r more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: [email protected]
License: UNKNOWN
Location: c:\users\usama\appdata\local\programs\python\python38\lib\site-packages
Requires: requests
Required-by:
Code snippet involved with the issue
from bs4 import BeautifulSoup as bs
import cfscrape
url="https://dreamsfriend.com/geturl/Boards%20and%20Beyond%20USMLE/01%20Enzymes"
scraper = cfscrape.create_scraper()
get=scraper.get(url).text
print(get)
Complete exception and traceback
(If the problem doesn't involve an exception being raised, leave this blank)
Traceback (most recent call last):
File "C:\Users\Usama\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 251, in solve_challenge
challenge, ms = re.search(
AttributeError: 'NoneType' object has no attribute 'groups'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Usama\Desktop\freemedtube-scraper.py", line 18, in <module>
get=scraper.get(url).text
File "C:\Users\Usama\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "C:\Users\Usama\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 129, in request
resp = self.solve_cf_challenge(resp, **kwargs)
File "C:\Users\Usama\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 204, in solve_cf_challenge
answer, delay = self.solve_challenge(body, domain)
File "C:\Users\Usama\AppData\Local\Programs\Python\Python38\lib\site-packages\cfscrape\__init__.py", line 290, in solve_challenge
raise ValueError(
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.
Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
URL of the Cloudflare-protected page
https://dreamsfriend.com/geturl/Boards%20and%20Beyond%20USMLE/01%20Enzymes
URL of Pastebin/Gist with HTML source of protected page
https://gist.github.com/musama95/07f5aceb14d6837b2b8b4ba348d6520b
Got the same problem on https://bitcointalk.org/index.php?topic=5215051.0;all. CF had changed their js script. Now it's not included into html page, but it's loaded by another script:
var cpo = document.createElement('script');
cpo.type = 'text/javascript';
cpo.src = "/cdn-cgi/challenge-platform/orchestrate/jsch/v1"; <--- HERE
var done = false;
cpo.onload = cpo.onreadystatechange = function() {
if (!done && (!this.readyState || this.readyState === "loaded" || this.readyState === "complete")) {
done = true;
cpo.onload = cpo.onreadystatechange = null;
window._cf_chl_enter()
}
};
document.getElementsByTagName('head')[0].appendChild(cpo);
Can this problem be fixed or is it impossible to fix? I don't understand how cloudflare works and I have a very rudimentary knowledge of Js so I'm not sure what the code you wrote does @slo-Oth