cloudscraper icon indicating copy to clipboard operation
cloudscraper copied to clipboard

Enhancement: Support for Modern Cloudflare Challenge Format

Open rsforbes opened this issue 6 months ago • 12 comments

Enhancement: Support for Modern Cloudflare Challenge Format

Summary

My API library for prosportstransactions.com stopped working because of Cloudflare protection. I integrated cloudscraper v3.0.0 hoping to solve the problem. Unfortunately, I was unsuccessful in my attempt. This led me to conduct extensive research on modern Cloudflare challenges and develop enhanced V3 handler improvements that could benefit the cloudscraper project.

Key Finding: My enhanced V3 handler successfully detects and parses new window._cf_chl_opt challenge structures, but prosportstransactions.com (and similar advanced sites) remain inaccessible due to transport-layer TLS fingerprint detection.

While my work did not solve the core TLS fingerprinting problem that blocks advanced Cloudflare protection sites like prosportstransactions.com, I identified significant improvements to challenge detection and handling that would enhance cloudscraper's compatibility with modern challenge formats.

Research Context

Target Site: prosportstransactions.com (advanced Cloudflare protection) Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research Cloudscraper Version: v3.0.0 (from tag, not PyPI) Test Methodology: 12 systematic configurations including:

  • Standard v3.0.0 base implementation
  • PR #295 testing (session management improvements)
  • PR #283 testing (additional headers and browser fingerprinting)
  • Custom patched V3 handler for modern challenge detection

Installation Used:

cloudscraper = {git = "https://github.com/VeNoMouS/cloudscraper.git", rev = "refs/pull/295/head"}

Key Findings

New Challenge Format Discovered

Modern Cloudflare challenges now use a different JavaScript structure that current cloudscraper doesn't detect:

Traditional format (currently detected):

window._cf_chl_ctx = {...};

Modern format (not detected):

window._cf_chl_opt = {
    cvId: '3',
    cZone: 'prosportstransactions.com',
    cType: 'managed',
    cRay: '956ddeea7c0960a1',
    cH: 'R7tFH...',
    cUPMDTk: 'R7tFH...',
    cFPWv: 'b',
    cITimeS: '1751119949',
    fa: '/cdn-cgi/challenge-platform/h/b/g/orchestrate/managed/v1?...',
    md: 'MhKFh...',
    mdrd: 'cOM9...'
};

Updated URL Patterns

Traditional: /cdn-cgi/challenge-platform/h/b/ Modern: /cdn-cgi/challenge-platform/h/b/jsd/r/{complex_identifier}/{ray_id}

Where {complex_identifier} is extracted from:

__CF$cv$params = {
    r: '0.01313896590161113:1751120168:uuGQcGrMYKAbiU7S-5nWc8aWLxMzIT5mqxDn71u5s1Q',
    t: 'MTc1MTEyMDkwNy4wMDAwMDA='
};

Payload Format Changes

Traditional: Form data (application/x-www-form-urlencoded) Modern: JSON payload (text/plain;charset=UTF-8)

Proposed Enhancements

1. Enhanced Challenge Detection

Update is_V3_Challenge() to detect modern format:

@staticmethod
def is_V3_Challenge(resp):
    try:
        return (
            resp.headers.get("Server", "").startswith("cloudflare")
            and resp.status_code in [403, 429, 503]
            and (
                # Existing patterns...
                re.search(r"""cpo\.src\s*=\s*['\"]/cdn-cgi/challenge-platform/\S+orchestrate/jsch/v3""", resp.text, re.M | re.S)
                or re.search(r"window\._cf_chl_ctx\s*=", resp.text, re.M | re.S)
                or re.search(r'<form[^>]*id="challenge-form"[^>]*action="[^"]*__cf_chl_rt_tk=', resp.text, re.M | re.S)
                or
                # NEW: Modern challenge format detection
                (
                    "Just a moment" in resp.text
                    and "/challenge-platform/" in resp.text
                    and re.search(r"window\._cf_chl_opt\s*=", resp.text)
                    and resp.headers.get("cf-mitigated") == "challenge"
                )
            )
        )
    except AttributeError:
        pass
    return False

2. JavaScript Object Parser

Many modern challenges use JavaScript object notation instead of JSON:

def parse_js_object_manually(self, js_obj_str):
    """Manually parse JavaScript object when JSON parsing fails"""
    try:
        data = {}
        patterns = [
            (r"cvId:\s*'([^']+)'", "cvId"),
            (r'cZone:\s*"([^"]+)"', "cZone"),
            (r"cType:\s*'([^']+)'", "cType"),
            (r"cRay:\s*'([^']+)'", "cRay"),
            (r'cH:\s*"([^"]+)"', "cH"),
            (r'cUPMDTk:\s*"([^"]+)"', "cUPMDTk"),
            (r"cFPWv:\s*'([^']+)'", "cFPWv"),
            (r"cITimeS:\s*'([^']+)'", "cITimeS"),
        ]
        
        for pattern, key in patterns:
            match = re.search(pattern, js_obj_str)
            if match:
                data[key] = match.group(1)
        
        return data
    except Exception:
        return {}

3. Complex URL Construction

Support for modern challenge URL patterns:

# Extract complex identifier from __CF$cv$params
cf_params_match = re.search(
    r'__CF\$cv\$params\s*=\s*\{.*?r:\s*[\'"]([^\'"]+)[\'"]',
    resp.text,
    re.DOTALL,
)

if cf_params_match and "cRay" in opt_data:
    r_param = cf_params_match.group(1)
    ray_id = opt_data["cRay"]
    form_action = f"/cdn-cgi/challenge-platform/h/b/jsd/r/{r_param}/{ray_id}"

4. JSON Payload Support

Handle modern JSON payloads instead of form data:

# For modern challenges, send JSON payload
if challenge_data.get("is_modern", False):
    payload_data = {
        "chctx": opt_data,
        "answer": challenge_answer,
    }
    
    # Add specific fields from opt_data
    for key in ["cvId", "cRay", "cType", "cZone", "cUPMDTk", "cFPWv", "cITimeS"]:
        if key in opt_data:
            payload_data[key] = opt_data[key]
    
    # Use JSON content type
    headers.update({
        "Content-Type": "text/plain;charset=UTF-8",
        "Accept": "*/*",
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
    })
    
    return json.dumps(payload_data)

Implementation Reference

My complete implementation is available in the research branch:

  • Enhanced V3 Handler: temp_cloudscraper/cloudscraper/cloudflare_v3_patched.py (495 lines)
  • Integration Code: src/pro_sports_transactions/search.py (CloudscraperConfig class)
  • Test Results: docs/cloudscraper/cloudscraper-testing.md (12 systematic tests)
  • Technical Analysis: docs/cloudscraper/TECHNICAL_ANALYSIS.md

Test Results

I tested across multiple configurations with cloudscraper v3.0.0:

Configuration Base Version Result Key Finding
Standard v3.0.0 Tag release TIMEOUT Basic implementation insufficient
PR #295 (Session) Pull request FAIL Session improvements help but insufficient
PR #283 (Headers) Pull request TIMEOUT Additional headers don't solve core issue
Custom V3 Patched My enhancement FAIL Successfully detects modern challenges but blocked by TLS fingerprinting

Despite these improvements, advanced sites like prosportstransactions.com still block requests due to TLS fingerprinting - the fundamental limitation that requires browser-level solutions. However, these enhancements would improve cloudscraper's compatibility with sites using modern challenge formats.

Key Discovery: My patched V3 handler successfully detected and parsed the modern challenge format, proving the enhancement works - the failure occurs at the TLS transport layer, not the challenge handling layer.

Benefits to Cloudscraper

  1. Broader Compatibility: Support sites using modern challenge format
  2. Future-Proofing: Handle evolving Cloudflare challenge patterns
  3. Better Parsing: Robust JavaScript object handling
  4. Modern Standards: JSON payload support for current challenges

Implementation Priority

High Priority: Challenge detection improvements (#1) Medium Priority: JavaScript object parser (#2) and URL construction (#3) Low Priority: JSON payload support (#4) - fewer sites use this format

Notes

  • Implementation maintains backward compatibility with existing challenges
  • TLS fingerprinting remains the primary obstacle for advanced protection sites - this is the core unsolved problem

Collaboration

While I haven't solved the TLS fingerprinting challenge, these improvements would benefit cloudscraper's compatibility with modern challenge formats. For my specific use case with prosportstransactions.com, I'll be exploring Playwright and curl_cffi to see if I can bypass the current TLS fingerprinting obstacle. If there's interest on addressing TLS fingerprinting within cloudscraper itself, I'd be happy to submit a PR for the above work.


Research Branch: https://github.com/rsforbes/pro_sports_transactions/tree/feature/cloudflare-bypass-research Documentation: docs/cloudscraper/ directory contains complete technical analysis and test results

rsforbes avatar Jun 29 '25 15:06 rsforbes

How about use https://github.com/lexiforest/curl_cffi to bypass TLS

xAffan avatar Jul 03 '25 16:07 xAffan

@xAffan - I had issues with curl_cffi as well.

This appears to be a JA3/JA4 fingerprinting issue. Based on the example output and the response headers:

  1. 403 Forbidden with cf-mitigated: challenge - This shows Cloudflare detected the request as suspicious
  2. Accept-CH headers requesting browser characteristics - The response includes several Client Hints headers (Sec-CH-UA-, UA-) which are part of modern browser fingerprinting
  3. Cloudflare challenge page - The HTML response starts with "Just a moment..." which is Cloudflare's challenge page

JA3/JA4 fingerprinting analyzes the TLS handshake characteristics to identify the client. Cloudscraper uses older TLS fingerprinting evasion techniques that work with JA3, but JA4 is more sophisticated and includes:

  • TLS cipher suite ordering
  • TLS extensions
  • HTTP/2 ALPN negotiation
  • Client Hello packet structure
  • Additional entropy from newer TLS 1.3 features

That prosportstransactions.com blocks cloudscraper immediately (403 status) rather than serving a JavaScript challenge suggests they're using JA4 or similar advanced fingerprinting to detect that the TLS handshake doesn't match a real browser, regardless of the HTTP headers cloudscraper sends.

Runnable Example:

#!/usr/bin/env python3
"""
Minimal example demonstrating cloudscraper with prosportstransactions.com
This example shows the current state of cloudscraper's ability to handle
the site's JA3/JA4 fingerprinting and Cloudflare challenges.
"""

import sys

import cloudscraper


def test_basic_request():
    """Test basic GET request to prosportstransactions.com"""
    print("Testing cloudscraper with prosportstransactions.com...")
    print("-" * 60)

    # Create scraper instance
    scraper = cloudscraper.create_scraper()

    # Target URL
    url = (
        "https://www.prosportstransactions.com/basketball/Search/"
        "SearchResults.php?Player=&Team=&BeginDate=&EndDate="
        "&PlayerMovementChkBx=yes&submit=Search"
    )

    try:
        # Make request
        print(f"Requesting: {url}")
        response = scraper.get(url, timeout=30)

        # Print response details
        print(f"\nStatus Code: {response.status_code}")
        print(f"Headers: {dict(response.headers)}")
        print(f"\nContent Length: {len(response.content)} bytes")
        print("Content Preview (first 500 chars):")
        print(response.text[:500])

        # Check if we got blocked
        if "Checking your browser" in response.text or response.status_code == 403:
            print("\n[BLOCKED] Cloudflare challenge detected!")
            return False
        print("\n[SUCCESS] Request completed successfully!")
        return True

    except (
        cloudscraper.CloudflareChallengeError,
        Exception,
    ) as e:
        print(f"\n[ERROR] Request failed: {type(e).__name__}: {e}")
        return False


def test_with_browser_params():
    """Test with explicit browser parameters"""
    print("\n\nTesting with browser parameters...")
    print("-" * 60)

    # Create scraper with browser params
    scraper = cloudscraper.create_scraper(
        browser={"browser": "chrome", "platform": "windows", "desktop": True}
    )

    url = "https://www.prosportstransactions.com/"

    try:
        print(f"Requesting: {url}")
        response = scraper.get(url, timeout=30)
        print(f"Status Code: {response.status_code}")

        if response.status_code == 200 and "Checking your browser" not in response.text:
            print("[SUCCESS] Homepage accessible!")
            return True
        print("[BLOCKED] Still getting challenged")
        return False

    except (
        cloudscraper.CloudflareChallengeError,
        Exception,
    ) as e:
        print(f"[ERROR] {type(e).__name__}: {e}")
        return False


if __name__ == "__main__":
    # Run tests
    basic_success = test_basic_request()
    browser_success = test_with_browser_params()

    # Summary
    print("\n" + "=" * 60)
    print("SUMMARY:")
    print(f"Basic request: {'PASSED' if basic_success else 'FAILED'}")
    print(f"Browser params: {'PASSED' if browser_success else 'FAILED'}")

    # Exit with appropriate code
    sys.exit(0 if basic_success or browser_success else 1)

Results:

Testing cloudscraper with prosportstransactions.com...
------------------------------------------------------------
Requesting: https://www.prosportstransactions.com/basketball/Search/SearchResults.php?Player=&Team=&BeginDate=&EndDate=&PlayerMovementChkBx=yes&submit=Search

Status Code: 403
Headers: {'Date': 'Sun, 06 Jul 2025 13:19:15 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 
'Connection': 'close', 'accept-ch': 'Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-
Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-
Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA', 'cf-mitigated': 'challenge', 'critical-ch': 'Sec-CH-UA-
Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-
UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-
Version, UA-Platform, UA', 'cross-origin-embedder-policy': 'require-corp', 'cross-origin-opener-policy': 'same-origin', 'cross-origin-
resource-policy': 'same-origin', 'origin-agent-cluster': '?1', 'permissions-policy': 'accelerometer=(),autoplay=(),browsing-topics=
(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=
(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()', 'referrer-policy': 
'same-origin', 'server-timing': 'chlray;desc="95af6489f9d98fc2", cfL4;desc="?
proto=TCP&rtt=13467&min_rtt=13374&rtt_var=5082&sent=3&recv=5&lost=0&retrans=0&sent_bytes=2914&recv_bytes=1255&
delivery_rate=218334&cwnd=251&unsent_bytes=0&cid=16606c71f1323bf6&ts=34&x=0"', 'x-content-type-options': 'nosniff', 'x-
frame-options': 'SAMEORIGIN', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-
check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v4?
s=hSqyK83uZuFpVDLIC7qtgep4OK%2Bx1ruPA10XjiBTUbgMFZS%2BRi5YtnDjnRCbyKKpgJpxXtsXyjMVC%2BTFd1rQ1BXOEkqtt1Y%2
FNGI8aMOLehlRx%2B9HMA0uMMsnRTHDA4ex80RmKV9mYkXasO5gYZRwtQ%3D%3D"}],"group":"cf-nel","max_age":604800}', 
'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': 
'95af6489f9d98fc2-ORD', 'Content-Encoding': 'br', 'alt-svc': 'h3=":443"; ma=86400'}

Content Length: 7629 bytes
Content Preview (first 500 chars):
<!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title><meta http-equiv="Content-Type" 
content="text/html; charset=UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta name="robots" 
content="noindex,nofollow"><meta name="viewport" content="width=device-width,initial-scale=1"><style>*{box-sizing:border-
box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-
system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetic

[BLOCKED] Cloudflare challenge detected!


Testing with browser parameters...
------------------------------------------------------------
Requesting: https://www.prosportstransactions.com/
Status Code: 403
[BLOCKED] Still getting challenged

============================================================
SUMMARY:
Basic request: FAILED
Browser params: FAILED

rsforbes avatar Jul 06 '25 13:07 rsforbes

The website you sent always serves cloudflare challenge - even with a normal browser. It is probably configured like that. What I don't understand is the problem..? Isn't this library meant to solve these challenges using JS?

xAffan avatar Jul 07 '25 08:07 xAffan

@xAffan - Yes. That is my understanding of the library; however, the version of Cloudflare being served may be beyond the tested versions of this library:

README.md...

📊 Test Results All features tested with 100% success rate for core functionality:

✅ Basic requests: 100% pass rate ✅ User agent handling: 100% pass rate ✅ Cloudflare v1 challenges: 100% pass rate ✅ Cloudflare v2 challenges: 100% pass rate ✅ Cloudflare v3 challenges: 100% pass rate ✅ Stealth mode: 100% pass rate

rsforbes avatar Jul 07 '25 21:07 rsforbes

The website you sent always serves cloudflare challenge - even with a normal browser. It is probably configured like that. What I don't understand is the problem..? Isn't this library meant to solve these challenges using JS?

due to the nature and complexity of some of the checks in the challenges now, solving this with basic JS engines will fail, that is why i am rewriting this library and the cluster fuck of a PR that was sent that i merged, and now regret

I will provide a couple options for solving, but i also have a life outside of this repo as well as a day job, so if there is a delay in me sitting down to complete said work... then so be it..

VeNoMouS avatar Jul 07 '25 22:07 VeNoMouS

@VeNoMouS Please update Repo bro , you are number 1 in github can do this

Ahm3dksa avatar Jul 07 '25 23:07 Ahm3dksa

@VeNoMouS - Same boat as an open source maintainer myself. Totally understand the need for balance.

rsforbes avatar Jul 08 '25 15:07 rsforbes

@VeNoMouS Come on, waiting for your good news

GrapeStorm avatar Jul 10 '25 07:07 GrapeStorm

@rsforbes

Hi, do I understand correctly that you're trying to bypass the initial Cloudflare protection page on prosportstransactions.com using Cloudscraper? If so, a better approach might be to use Unflare, which is specifically designed for bypassing the initial protection page. Once it bypasses the page, it'll give you the desired headers, so you can pass them to your Cloudscraper service and make direct http requests to the target website.

iamyegor avatar Jul 14 '25 13:07 iamyegor

@rsforbes

Hi, do I understand correctly that you're trying to bypass the initial Cloudflare protection page on prosportstransactions.com using Cloudscraper? If so, a better approach might be to use Unflare, which is specifically designed for bypassing the initial protection page. Once it bypasses the page, it'll give you the desired headers, so you can pass them to your Cloudscraper service and make direct http requests to the target website.

Sweet! I'll check it out. Thanks!

rsforbes avatar Jul 14 '25 14:07 rsforbes

@rsforbes Btw, wanted to ask you what makes you think that Cloudflare detects you due to TLS fingerprinting?

iamyegor avatar Jul 17 '25 13:07 iamyegor

@rsforbes You can change the ja3/ja4 fingerprint by creating your own proxy server. However, the code structure is quite weak, and the JavaScript emulation libs cannot fully simulate a complete browser environment. The code may work for 1-2 days, but it will not work on the third day. https://github.com/VeNoMouS/cloudscraper/blob/9ea528a8675f1bebd49ff853d142e94988a95178/cloudscraper/cloudflare_v3.py#L154-L198

alpgul avatar Sep 15 '25 08:09 alpgul