cloudflare-scrape
cloudflare-scrape copied to clipboard
issue found
Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.
Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
-
[ YES ] I've upgraded cfscrape with
pip install -U cfscrape -
[ YES ] I'm using Node version 10 or higher ii nodejs 10.19.0~dfsg1-1 amd64
-
[ YES ] The site protection I'm having issues with is from Cloudflare
-
[ NO ] I'm not using Tor, a VPN, or an anonymizing proxy
Python version number
Run python --version and paste the output below:
root@balder:~# python --version
Python 2.7.16
root@balder:~#
cfscrape version number
Run pip show cfscrape and paste the output below:
root@balder:~# pip show cfscrape
Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: [email protected]
License: UNKNOWN
Location: /usr/local/lib/python2.7/dist-packages
Requires: requests
Required-by:
root@balder:~#
Code snippet involved with the issue
#!/usr/bin/env python
import csv
import os
import sys
import cfscrape
scraper = cfscrape.create_scraper()
filename = 'psn.csv'
with open(filename, 'rb') as f:
reader = csv.reader(f)
try:
for row in reader:
if 'http' in row[0]:
reverse = row[0][::-1]
i = reverse.index('/')
tmp = reverse[0:i]
cfurl = scraper.get(row[0]).content
if not os.path.exists("./"+tmp[::-1]):
with open(tmp[::-1], 'wb') as f:
f.write(cfurl)
f.close()
else:
print("file: ", tmp[::-1], "already exists")
except csv.Error as e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
Complete exception and traceback
root@balder:~# ./grab.py
Traceback (most recent call last):
File "./grab.py", line 20, in
Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."
(If the problem doesn't involve an exception being raised, leave this blank)
URL of the Cloudflare-protected page
cant provide if needed later
URL of Pastebin/Gist with HTML source of protected page
no idea what this is
same here (and it's my first use of the package)