cloudflare-scrape
cloudflare-scrape copied to clipboard
Getting 403 on all requests. CF might have pushed an update.
Before creating an issue, first upgrade cfscrape with pip install -U cfscrape
and see if you're still experiencing the problem. Please also confirm your Node version (node --version
or nodejs --version
) is version 10 or higher.
Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.
Please confirm the following statements and check the boxes before creating an issue:
- [x] I've upgraded cfscrape with
pip install -U cfscrape
- [x] I'm using Node version 10 or higher
- [x] The site protection I'm having issues with is from Cloudflare
- [x] I'm not using Tor, a VPN, or an anonymizing proxy
Python version number
Run python --version
and paste the output below:
Python 3.7.5
cfscrape version number
Run pip show cfscrape
and paste the output below:
Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: [email protected]
License: UNKNOWN
Location: /usr/local/lib/python3.7/site-packages
Requires: requests
Required-by:
Code snippet involved with the issue
>>> import cfscrape
>>> scraper = cfscrape.CloudflareScraper()
>>> r = scraper.get("https://www.nakedcph.com/")
<Response [403]>
Complete exception and traceback
(If the problem doesn't involve an exception being raised, leave this blank)
URL of the Cloudflare-protected page
https://www.nakedcph.com
URL of Pastebin/Gist with HTML source of protected page
https://hastebin.com/iwedudaheh.xml
Getting error 403 on almost every cf sites.
Pretty sure cfscrape cannot bypass forced captchas.
Pretty sure cfscrape cannot bypass forced captchas.
It was... a few days ago.
Probably because the website wasn't forcing captchas for every request 3 days ago.
Right now you will get a captcha even when using a normal browser.
Probably because the website wasn't forcing captchas for every request 3 days ago.
Right now you will get a captcha even when using a normal browser.
The website was always forcing captcha. Even on browser.
Try with this another one with no forced captcha: https://caliroots.com/
Still 403
mate have you tried with proxy, because i don't know how to use proxy with authetication :(
mate have you tried with proxy, because i don't know how to use proxy with authetication :(
Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)
mate have you tried with proxy, because i don't know how to use proxy with authetication :(
Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)
Not working !
proxies = { 'https' : 'https://jm:[email protected]:3883' }
cookie_value, user_agent = cfscrape.get_cookie_string("https://www.asaad.com/",proxies=proxies)
but it not works it take orignal IP but not the proxy ip
Same issue for "https://www.curseforge.com"
URL="https://www.curseforge.com"
tokens, user_agent = scrapper.get_tokens(URL)
print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))
Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()
This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information on what to fix.
Hope that helps.
i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.
On Thu, 2 Apr 2020 at 21:48, wsch-wa [email protected] wrote:
Same issue for "https://www.curseforge.com"
URL="https://www.curseforge.com" tokens, user_agent = scrapper.get_tokens(URL) print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))
Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()
This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information.
Hope that helps.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Anorov/cloudflare-scrape/issues/338#issuecomment-607946076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC66IJTMXH473HYTYOFGI3LRKS3FRANCNFSM4LDZ2HOQ .
i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored. …
My use case to be super clear on the issue:
- Browser (FF, IE,Chrome) shows the site without Captcha
- I am not using a proxy
- cfscrape returns a 403 error code which seems to be not representing the reality. The body-text shows "Normal content".
- Using Browsers I receive status 200 using F12 Debugging the traffic.
- For me the headers of Browser and cfscrape look similar it is just the status 200 vs. 403
Since I have no clue what information I have to provide to fix the problem, I provide the Browsers sent request and received answer. ---- Browser Sent Request {"Anfragekopfzeilen (386 B)":{"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8"},{"name":"Accept-Encoding","value":"gzip, deflate, br"},{"name":"Accept-Language","value":"de,en-US;q=0.7,en;q=0.3"},{"name":"Connection","value":"keep-alive"},{"name":"DNT","value":"1"},{"name":"Host","value":"www.curseforge.com"},{"name":"Upgrade-Insecure-Requests","value":"1"},{"name":"User-Agent","value":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}]}}
---- Received Answer {"Antwortkopfzeilen (1,459 KB)":{"headers":[{"name":"cache-control","value":"no-cache"},{"name":"cf-cache-status","value":"DYNAMIC"},{"name":"cf-ray","value":"57e1b846faa6cba0-VIE"},{"name":"content-encoding","value":"gzip"},{"name":"content-type","value":"text/html; charset=utf-8"},{"name":"date","value":"Fri, 03 Apr 2020 09:21:31 GMT"},{"name":"expect-ct","value":"max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct""},{"name":"expires","value":"-1"},{"name":"pragma","value":"no-cache"},{"name":"server","value":"cloudflare"},{"name":"set-cookie","value":"__cfduid=d0ba11b1bc5d1c04021503cd305a7ee481585905690; expires=Sun, 03-May-20 09:21:30 GMT; path=/; domain=.curseforge.com; HttpOnly; SameSite=Lax"},{"name":"set-cookie","value":"AWSALB=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/"},{"name":"set-cookie","value":"AWSALBCORS=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/; SameSite=None; Secure"},{"name":"set-cookie","value":"Unique_ID_v2=0dcb9c0455fc4d7589e75c024408615e; domain=.curseforge.com; expires=Wed, 03-Apr-2030 09:21:31 GMT; path=/"},{"name":"set-cookie","value":"__cf_bm=3cd18abfb0a930880bac2cc3a829276760cb4fba-1585905691-1800-ASOZlhSsbwiJ+ImNKM4F5d1gy9QkDueAfXcagsYDKar7m817Ju2aCCXZOKdVAISFWbyo4XQJshOFSWWsyGT2bFg=; path=/; expires=Fri, 03-Apr-20 09:51:31 GMT; domain=.curseforge.com; HttpOnly; Secure; SameSite=None"},{"name":"strict-transport-security","value":"max-age=15768000"},{"name":"x-aspnet-version","value":"4.0.30319"},{"name":"x-aspnetmvc-version","value":"5.2"},{"name":"X-Firefox-Spdy","value":"h2"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-mvc-supplant-cachable","value":"true"},{"name":"x-ua-compatible","value":"IE=edge,chrome=1"}]}}
bump.
My use case to be super clear on the issue:
- Browser (FF, IE,Chrome) shows the site without Captcha
- I am not using a proxy
- cfscrape returns a 403 error code which seems to be not representing the reality.
I think this is probably not an "under attack" cloudflare protection but a tls fingerprint protection then. https://pixeljets.com/blog/scrape-ninja-bypassing-cloudflare-403-code-1020-errors/ try this solution to confirm..