cloudflare-scrape icon indicating copy to clipboard operation
cloudflare-scrape copied to clipboard

Getting 403 on all requests. CF might have pushed an update.

Open azerpas opened this issue 4 years ago • 12 comments

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

  • [x] I've upgraded cfscrape with pip install -U cfscrape
  • [x] I'm using Node version 10 or higher
  • [x] The site protection I'm having issues with is from Cloudflare
  • [x] I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:

Python 3.7.5

cfscrape version number

Run pip show cfscrape and paste the output below:

Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: [email protected]
License: UNKNOWN
Location: /usr/local/lib/python3.7/site-packages
Requires: requests
Required-by: 

Code snippet involved with the issue

>>> import cfscrape
>>> scraper = cfscrape.CloudflareScraper()
>>> r = scraper.get("https://www.nakedcph.com/")
<Response [403]>

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)


URL of the Cloudflare-protected page

https://www.nakedcph.com

URL of Pastebin/Gist with HTML source of protected page

https://hastebin.com/iwedudaheh.xml

Getting error 403 on almost every cf sites.

azerpas avatar Mar 08 '20 14:03 azerpas

Pretty sure cfscrape cannot bypass forced captchas.

bakugo avatar Mar 08 '20 18:03 bakugo

Pretty sure cfscrape cannot bypass forced captchas.

It was... a few days ago.

azerpas avatar Mar 08 '20 19:03 azerpas

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

bakugo avatar Mar 08 '20 19:03 bakugo

Probably because the website wasn't forcing captchas for every request 3 days ago.

Right now you will get a captcha even when using a normal browser.

The website was always forcing captcha. Even on browser.

Try with this another one with no forced captcha: https://caliroots.com/

Still 403

azerpas avatar Mar 08 '20 19:03 azerpas

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Sarfroz avatar Mar 10 '20 21:03 Sarfroz

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

azerpas avatar Mar 11 '20 11:03 azerpas

mate have you tried with proxy, because i don't know how to use proxy with authetication :(

Same as requests session (https://stackoverflow.com/questions/13506455/how-to-pass-proxy-authentication-requires-digest-auth-by-using-python-requests)

Not working !

proxies = { 'https' : 'https://jm:[email protected]:3883' } 
cookie_value, user_agent = cfscrape.get_cookie_string("https://www.asaad.com/",proxies=proxies)

but it not works it take orignal IP but not the proxy ip

Sarfroz avatar Mar 11 '20 18:03 Sarfroz

Same issue for "https://www.curseforge.com"

    URL="https://www.curseforge.com"
    tokens, user_agent = scrapper.get_tokens(URL)
    print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))

Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information on what to fix.

Hope that helps.

wsch-wa avatar Apr 02 '20 16:04 wsch-wa

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.

On Thu, 2 Apr 2020 at 21:48, wsch-wa [email protected] wrote:

Same issue for "https://www.curseforge.com"

URL="https://www.curseforge.com"
tokens, user_agent = scrapper.get_tokens(URL)
print('Tokens: %s\nUser-Agent: %s\nHeader: %s\n' % (tokens, user_agent, scrapper.headers))

Additional Note: I played within get_tokens procedure and added a comment before resp.raise_for_status()

This returned the cookies. I assume Cloudflare manages to provide a 403 forbidden and still delivers content. Within solve_cf_challenge I can see the challenge-form string. The parsing for submit_url has an issue, I am missing the knowledge to provide the information.

Hope that helps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Anorov/cloudflare-scrape/issues/338#issuecomment-607946076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC66IJTMXH473HYTYOFGI3LRKS3FRANCNFSM4LDZ2HOQ .

Sarfroz avatar Apr 03 '20 06:04 Sarfroz

i am getting the cookies but that cookies does not come from proxy instead it comes from my real IP. That is the issue the proxy is being ignored.

My use case to be super clear on the issue:

  • Browser (FF, IE,Chrome) shows the site without Captcha
  • I am not using a proxy
  • cfscrape returns a 403 error code which seems to be not representing the reality. The body-text shows "Normal content".
  • Using Browsers I receive status 200 using F12 Debugging the traffic.
  • For me the headers of Browser and cfscrape look similar it is just the status 200 vs. 403

Since I have no clue what information I have to provide to fix the problem, I provide the Browsers sent request and received answer. ---- Browser Sent Request {"Anfragekopfzeilen (386 B)":{"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8"},{"name":"Accept-Encoding","value":"gzip, deflate, br"},{"name":"Accept-Language","value":"de,en-US;q=0.7,en;q=0.3"},{"name":"Connection","value":"keep-alive"},{"name":"DNT","value":"1"},{"name":"Host","value":"www.curseforge.com"},{"name":"Upgrade-Insecure-Requests","value":"1"},{"name":"User-Agent","value":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}]}}

---- Received Answer {"Antwortkopfzeilen (1,459 KB)":{"headers":[{"name":"cache-control","value":"no-cache"},{"name":"cf-cache-status","value":"DYNAMIC"},{"name":"cf-ray","value":"57e1b846faa6cba0-VIE"},{"name":"content-encoding","value":"gzip"},{"name":"content-type","value":"text/html; charset=utf-8"},{"name":"date","value":"Fri, 03 Apr 2020 09:21:31 GMT"},{"name":"expect-ct","value":"max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct""},{"name":"expires","value":"-1"},{"name":"pragma","value":"no-cache"},{"name":"server","value":"cloudflare"},{"name":"set-cookie","value":"__cfduid=d0ba11b1bc5d1c04021503cd305a7ee481585905690; expires=Sun, 03-May-20 09:21:30 GMT; path=/; domain=.curseforge.com; HttpOnly; SameSite=Lax"},{"name":"set-cookie","value":"AWSALB=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/"},{"name":"set-cookie","value":"AWSALBCORS=+yuMNwAu+kaIeFGsZoE/gEgxcqYLWviotTfkwIBRTZldvf1vW0mQ7l/hrNkE0jeAAJ1SsKTOwXEJXXaRM2gdeyUOt6YqMM4m83I3vDm3lShl+SyeaUXeNDcW9pqG; Expires=Fri, 10 Apr 2020 09:21:30 GMT; Path=/; SameSite=None; Secure"},{"name":"set-cookie","value":"Unique_ID_v2=0dcb9c0455fc4d7589e75c024408615e; domain=.curseforge.com; expires=Wed, 03-Apr-2030 09:21:31 GMT; path=/"},{"name":"set-cookie","value":"__cf_bm=3cd18abfb0a930880bac2cc3a829276760cb4fba-1585905691-1800-ASOZlhSsbwiJ+ImNKM4F5d1gy9QkDueAfXcagsYDKar7m817Ju2aCCXZOKdVAISFWbyo4XQJshOFSWWsyGT2bFg=; path=/; expires=Fri, 03-Apr-20 09:51:31 GMT; domain=.curseforge.com; HttpOnly; Secure; SameSite=None"},{"name":"strict-transport-security","value":"max-age=15768000"},{"name":"x-aspnet-version","value":"4.0.30319"},{"name":"x-aspnetmvc-version","value":"5.2"},{"name":"X-Firefox-Spdy","value":"h2"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-frame-options","value":"SAMEORIGIN"},{"name":"x-mvc-supplant-cachable","value":"true"},{"name":"x-ua-compatible","value":"IE=edge,chrome=1"}]}}

wsch-wa avatar Apr 03 '20 09:04 wsch-wa

bump.

KebabLord avatar Jan 04 '21 00:01 KebabLord

My use case to be super clear on the issue:

  • Browser (FF, IE,Chrome) shows the site without Captcha
  • I am not using a proxy
  • cfscrape returns a 403 error code which seems to be not representing the reality.

I think this is probably not an "under attack" cloudflare protection but a tls fingerprint protection then. https://pixeljets.com/blog/scrape-ninja-bypassing-cloudflare-403-code-1020-errors/ try this solution to confirm..

restyler avatar Nov 23 '21 19:11 restyler