mitmproxy icon indicating copy to clipboard operation
mitmproxy copied to clipboard

mitmproxy's TLS fingerprint doesn't match typical web browsers

Open zeen opened this issue 3 years ago • 17 comments

Cloudflare detects MITM via TLS fingerprinting [1][2]. mitmproxy's traffic is flagged as bot traffic, since the TLS fingerprint doesn't match the User-Agent's expected one. Cloudflare's "bot fight mode" [3] responds with a 403 error response, attempts to run JS and show captchas in response to this.

In cases where the request is a cross-site request (AJAX, etc), there's no sane way to work around this (since you don't even see the captcha in the browser). This makes mitmproxy unusable for sites which load content via cross-site AJAX requests to domains with Cloudflare bot detection enabled.

The ideal fix would be to mirror the client's TLS configuration (cipher suites, etc) in the outgoing connections. Another less perfect (but possibly easier) approach would be hardcoding a specific browser's (e.g., Chrome on Windows) TLS settings, and overriding the User-Agent header to match. This would make traffic indistinguishable from a normal browser.

This may also be an issue with AWS WAF's bot detection [4] and other similar services.

[1] https://malcolm.cloudflare.com/ [2] https://github.com/cloudflare/mitmengine [3] https://blog.cloudflare.com/super-bot-fight-mode/ [4] https://aws.amazon.com/waf/features/bot-control/

zeen avatar Apr 24 '21 13:04 zeen

Thanks for raising this! The relevant configuration is all in addons/tlsconfig.py. I'd be happy to merge a PR that makes mitmproxy look more like the current client or a hardcoded browser.

mhils avatar Apr 24 '21 13:04 mhils

Did a little further research, pasting my notes:

This is a helpful website for detecting your current fingerprint:

  • https://ja3er.com
    • JSON end-point: https://ja3er.com/json
    • My fingerprint with Firefox on OSX:
      • With no proxy: a75de44db3e351bbd8d38b64c41f444e
      • With mitmproxy: 652c612a3267ed8a6f6e6c42c46ce534

This is another helpful database of fingerprints (and has a cool feature where it can generate uTLS code to match the fingerprint):

  • https://tlsfingerprint.io/
    • Most popular fingerprint (modern Chrome): https://tlsfingerprint.io/id/9c673fd64a32c8dc
      • It shouldn't be too hard to port the uTLS generated code from this into addons/tlsconfig.py, assuming the Python OpenSSL lib allows customizing all the relevant TLS properties

zeen avatar Apr 24 '21 15:04 zeen

This is a very interesting topic, thanks for bringing it up!

Another less perfect (but possibly easier) approach would be hardcoding a specific browser's (e.g., Chrome on Windows) TLS settings, and overriding the User-Agent header to match

I think this approach is not acceptable to be applied to general mitmproxy traffic. If this would be the route we take then this needs to be an option that is disabled by default (like the anticache option). I don't want mitmproxy to change any semantics of my HTTP requests without my explicit permission. On the contrary, changing/trying different user-agents through a mitmproxy addon is regular penetration testing business (e.g. to trigger a mobile version or to test for different responses based on UA) and we don't want to lose that control.

This also implies that we can't just use the tls settings of the incoming request. We need to take the UA into account after all add-ons are done.

This also conflicts with connection_strategy = eager, right?

Also what about HTTP2 and multiple user-agents within the same connection? I assume CloudFlare wouldn't like that at all.

Edit: One last thought: if we can actually make mitmproxy fool the WAF, then what's the point of the WAF? Couldn't any malicious interceptor/mitm/bot do the same thing? Sounds like a pointless arms race at the cost of annoying legitimate users with false positives, or am I missing something?

Prinzhorn avatar Apr 29 '21 10:04 Prinzhorn

I think this approach is not acceptable to be applied to general mitmproxy traffic. If this would be the route we take then this needs to be an option that is disabled by default (like the anticache option). I don't want mitmproxy to change any semantics of my HTTP requests without my explicit permission.

I think I missed the user agent part in the initial post - @Prinzhorn is right here. We can't change HTTP semantics automatically. I don't mind the TLS configuration as much, here we can experiment.

This also implies that we can't just use the tls settings of the incoming request. We need to take the UA into account after all add-ons are done.

I think the strategy should be either 1) mirroring what the client does or 2) unconditionally hardcoding something that resembles a browser on the first look. Let's not add unnecessary complexity here.

Updating the TLS config is the responsibility of your UA-changing script. 😉

This also conflicts with connection_strategy = eager, right?

For the user-agent yes, but see my previous point. We usually have the ClientHello at this stage.

Also what about HTTP2 and multiple user-agents within the same connection? I assume CloudFlare wouldn't like that at all.

How would you have multiple user agents in the same connection? We don't reuse server connections across clients. (bringing up all these weird edge cases is good anyways, thanks for that!)

Couldn't any malicious interceptor/mitm/bot do the same thing?

Sure, but they actually would need to know what theyare doing. 😋 Of course it's no perfect security mechanism, but you are increasing your adversary's (development) cost. The false-positive ratio is really negligible if you block a Chrome UA with a cURL TLS fingerprint.

mhils avatar Apr 29 '21 16:04 mhils

I think the strategy should be either 1) mirroring what the client does or 2) unconditionally hardcoding something that resembles a browser on the first look. Let's not add unnecessary complexity here.

Updating the TLS config is the responsibility of your UA-changing script. wink

I agree and 1 sounds good to me. This would work for the 99% of people that don't update the UA and it sounds rather straight forward on mitmproxy's end (if we have control over all needed bits). We also don't need a database of UA/TLS config etc. (that's the direction my comment was going in, because if we had that in place then it would just be a matter of doing things in the right order). If you need that type of thing you can do it in an add-on as you said.

How would you have multiple user agents in the same connection?

Does something keep an add-on from rotating User-Agents for every request()? User-Agents are HTTP data like any other, so sure I can have different User-Agents for each request within the same HTTP/2 connection in the same way I can have different paths. Or am I tripping?

all these weird edge cases

They are my bread and race-conditions are my butter. I just like things to always work and not just most of the time.

Sure, but they actually would need to know what theyare doing. yum Of course it's no perfect security mechanism, but you are increasing your adversary's (development) cost. The false-positive ratio is really negligible if you block a Chrome UA with a cURL TLS fingerprint.

You're right, it will block the masses of script kiddies and outdated tools that don't implement this type of magic.

Prinzhorn avatar Apr 29 '21 16:04 Prinzhorn

This is a helpful website for detecting your current fingerprint:

https://ja3er.com

I would avoid this site for that purpose. That site is known to produce provably false results. For example, I just got this right now:

771,4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-156-157-47-53-10,0-23-65281-10-11-35-16-5-51-43-13-45-28,,

That JA3 lists SSLExtension number 10 (supported_groups) and 11 (ec_point_formats) [1], but then it doesnt provide any EllipticCurve or EllipticCurvePointFormat (the last two arguments). I did find servers that would accept this type of malformed JA3, but in general I think its just an error in whatever was used to create the JA3 string. You either need to omit extensions 10 and 11, or provide values for both.

  1. https://iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml

89z avatar Oct 21 '21 21:10 89z

This post suggests that shuffling the order of proposed ciphers can be effective. The hypothesis is that since legitimate clients can change their fingerprint at any time, WAFs need to use an explicit denylist of known bad fingerprints. I'm going to play around a bit and see if this works against Cloudflare. I do wonder if the logic is "deny if known undesirable client" instead of "deny if the fingerprint doesn't make sense for the detected client".


This is also a nice approach because you can actually realize it in Python. Its possible to "parrot" a known browser fingerprint (see utls) but it requires a lower-level interface to TLS than Python can provide.

EDIT: I see we're using pyOpenSSL here, so maybe we do have the necessary capabilities to parrot.

r1b avatar Jan 30 '22 00:01 r1b

The hypothesis is that since legitimate clients can change their fingerprint at any time, WAFs need to use an explicit denylist of known bad fingerprints.

This is not true, at least not with all servers. For example, the server android.clients.google.com uses a whilelist [1] consisting of the fingerprints for each Android API, with a range of API 24 (2016) through 32, and likely earlier versions as well.

  1. https://github.com/89z/format/blob/v1.21.1/crypto/parse.go#L12-L32

89z avatar Jan 30 '22 00:01 89z

I was unable to reproduce the issue running the main branch of mitmproxy against a cloudflare-fronted domain that I control. I'm using the "Pro" plan. I tried many combinations of rules but I could only get a javascript challenge when:

  • Enabling "under attack mode"
  • Explicitly enabling "javascript detections" in the bot settings

Enabling the setting that blocks "Definitely automated" bots did block cURL. I'm pretty confident this was from TLS fingerprinting because if I added --tls-max 1.1 the request was allowed >:)

When using mitmproxy, I couldn't reproduce any block or javascript challenge. Long shot, but @zeen do you have a reproducer?

r1b avatar Feb 01 '22 01:02 r1b

@r1b Haven't focused on this issue in while, and the original website configuration changed such that it's not an issue. Another website that I think reproduces (no affiliation) is https://mangahub.io/, which also does a POST to https://api.mghubcdn.com/graphql, and I think both the primary domain and API subdomain had different configurations of Cloudflare bot protection, and used to fail with mitmproxy.

We'd worked around the original issue with the original website by using got-scraping (in a custom proxy script, not with mitmproxy, with a fixed Firefox config as the random browser selector was triggering failing configurations).

Regarding "shuffling the order of proposed ciphers", I'd suggest mimicking whatever real browsers do (do real browsers shuffle the order?), since I imagine anything else would be a casualty in the scraping bots vs bot detection arms race. I think Cloudflare learns of new configurations automatically, so new or rarely used User-Agents receive the JS challenge, while common User-Agent configurations have sufficient reputation to bypass the challenge.

zeen avatar Feb 01 '22 17:02 zeen

So is it possible to change mitmproxy tls fingerprint?

VovkoO avatar Jul 28 '22 15:07 VovkoO

So is it possible to change mitmproxy tls fingerprint

If mitmproxy uses OpenSSL, it can but with a lot of limitations.

OpenSSL can't really impersonate real browser because they have Negotiaton cipher, they can't change extensions, compression and many things, unless OpenSSL willing to change mitmproxy better off implement its own TLS library.

GunGunGun avatar Dec 05 '22 16:12 GunGunGun

I've recently been hit by cloudflare blocking mitmproxy somehow. The interesting thing, though, is if I take the request from the browser side with "copy as curl", I don't get the same error with curl, and, if I write a tiny python script compatible with the necessary curl options from that command line, using the urllib.request API, I don't get the error either.

glandium avatar Dec 03 '23 23:12 glandium

In my case, making mitmproxy.net.tls._create_ssl_context return the context before calling set_cipher_list fixes the issue. So cloudflare is definitely looking at the cypher list.

glandium avatar Dec 04 '23 00:12 glandium

When will this be fixed? Can't even use this with ChatGPT anymore!

CoffeeShifter avatar Dec 17 '23 18:12 CoffeeShifter

In cases where I get some type of server web page indicating the site is not going to fulfill my request, I can add the host or ip address to the ignore_hosts list and the sites will load. I have a list of 5-6 such sites that I routinely add to the ignore_hosts list. However today for reddit.com adding reddit.com to the ignore_hosts strategy is not working. I get the below message. Am I correct in thinking that the connection is being blocked in a way similar to what others are describing here with the TLS fingerprinting?

I have verified that both wireguard and regular mode are blocked. I am starting mitmproxy as follows:

mitmdump --set block_global=false --mode 'wireguard:/home/[email protected]:60002' --ignore-hosts reddit
mitmdump --set block_global=false  --mode regular@12345 --ignore-hosts reddit 

Since WG mode uses the ip address instead of hostname I also tested the following to ensure I was ignoring all traffic by matching domain names and ips.

mitmdump --set block_global=false --mode 'wireguard:/home/[email protected]:60002' --ignore-hosts reddit
 --ignore-hosts '.*'

All of these variations produce the same result. Am I correct in assuming that the strategies discussed here and https://github.com/caido/caido/issues/523 can solve the problem or is this a manifestation of a different issue?

Version: Mitmproxy: 10.2.1 Python: 3.11.2 OpenSSL: OpenSSL 3.1.2 1 Aug 2023 Platform: Linux-6.1.0-15-cloud-arm64-aarch64-with-glibc2.36

whoa there, pardner!

Your request has been blocked due to a network policy.

Try logging in or creating an account here to get back to browsing.

If you're running a script or application, please register or sign in with your developer credentials here. Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again. if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

You can read Reddit's Terms of Service here.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

when contacting us, please include your ip address which is: A.B.C.D and reddit account

I also tested the latest code and get the same result.

mitmdump --set block_global=false --mode 'wireguard:/home/[email protected]:60002' --ignore-hosts reddit
 --ignore-hosts '.*'

Mitmproxy: 11.0.0.dev (+2, commit 865e113) Python: 3.11.2 OpenSSL: OpenSSL 3.1.2 1 Aug 2023 Platform: Linux-6.1.0-15-cloud-arm64-aarch64-with-glibc2.36

rosydawn6 avatar Jan 23 '24 22:01 rosydawn6

Update on my earlier post: https://github.com/mitmproxy/mitmproxy/issues/4575#issuecomment-1907038328 :
The web application firewall I am encountering for reddit.com is not specific to mitmproxy. I have performed the following additional tests: 1a) Launch mate based ec2 instance and accessing https://www.reddit.com from Chromium 117.0.5938.132 (Official Build) Fedora Project (64-bit) 1b) Using mate based ec2 instance access https://www.reddit.com from Chrome 121.0.6167.139 (Official Build) (64-bit) 3) Launch debian based ec2 instance and issuing curl -X GET "https://www.reddit.com" In all scenarios I get the "whoa there, pardner!" web page indicating that reddit is likely blocking these requests using some type of ip based filtering.

I am also seeing discussion of this ip level blocking on some reddit posts. One suggested work around is to login with a reddit account. I did try that and can confirm that the issues with reddit blocking the ip are now resolved.

rosydawn6 avatar Feb 01 '24 19:02 rosydawn6

In my case, making mitmproxy.net.tls._create_ssl_context return the context before calling set_cipher_list fixes the issue. So cloudflare is definitely looking at the cypher list.

Something changed again on cloudflare, now I need to remove the call to context.set_alpn_select_callback

glandium avatar Feb 08 '24 05:02 glandium

Relevant here: https://github.com/fedosgad/mirror_proxy

mhils avatar Feb 12 '24 21:02 mhils