d4v1d
d4v1d copied to clipboard
circumvent fingerprinting
description
Try to prevent platforms from rate-limiting bots (especially anonymous ones) by all available means. Probably a good idea to switch up HTTP headers on every other request, but also do more than that. Client fingerprinting shouldn't be the biggest issue, however, since that basically relies on JavaScript, afaik, and that's not really applicable to how d4v1d bots should normally gather data.
references
To get better control of lower-level connection parameters (TLS & HTTP/2) - perhaps taking a look at something like PyCurl especially in combination with curl-impersonate is a good idea.
A rotating proxy functionality would also be great.
Found curl_cffi in a discussion about PyCurl integration for curl-impersonate - looks like a rather promising project. Going to try and base a sort of "anonymous session" class upon it.
Edit: Found a blog post (curl_cffi: A Python library that supports natively simulated browser TLS/JA3 fingerprinting) by the author of curl_cffi.
It's still not enough; need to do more research on how the "anonymous" session can still be identified as I'm still getting rate limited using the code base at commit (ac0303e3e011db1825aad5b0b018bedf1487a652) with AnonSession, etc.
Recommendation at the moment: Use a virtual machine / enforce IPv4, it could very well be that platforms like Instagram are more likely to block IPv6 addresses, as they should be assigned to exactly one device, whereas IPv4 addresses are commonly NATed, and therefore might actually have several clients behind them => they're probably a little more reluctant, when it comes to blocking those.
Edit: Nvm, I can fetch the site in a virtual machine using the exact same IPv6 address as my host machine, if I have been rate-limited on the host...