d4v1d icon indicating copy to clipboard operation
d4v1d copied to clipboard

circumvent fingerprinting

Open MattMoony opened this issue 2 years ago • 5 comments

description

Try to prevent platforms from rate-limiting bots (especially anonymous ones) by all available means. Probably a good idea to switch up HTTP headers on every other request, but also do more than that. Client fingerprinting shouldn't be the biggest issue, however, since that basically relies on JavaScript, afaik, and that's not really applicable to how d4v1d bots should normally gather data.

references

MattMoony avatar Mar 25 '23 12:03 MattMoony

To get better control of lower-level connection parameters (TLS & HTTP/2) - perhaps taking a look at something like PyCurl especially in combination with curl-impersonate is a good idea.

MattMoony avatar Mar 26 '23 15:03 MattMoony

A rotating proxy functionality would also be great.

8twinni8 avatar Mar 28 '23 13:03 8twinni8

Found curl_cffi in a discussion about PyCurl integration for curl-impersonate - looks like a rather promising project. Going to try and base a sort of "anonymous session" class upon it.

Edit: Found a blog post (curl_cffi: A Python library that supports natively simulated browser TLS/JA3 fingerprinting) by the author of curl_cffi.

MattMoony avatar Mar 28 '23 18:03 MattMoony

It's still not enough; need to do more research on how the "anonymous" session can still be identified as I'm still getting rate limited using the code base at commit (ac0303e3e011db1825aad5b0b018bedf1487a652) with AnonSession, etc.

MattMoony avatar Mar 28 '23 20:03 MattMoony

Recommendation at the moment: Use a virtual machine / enforce IPv4, it could very well be that platforms like Instagram are more likely to block IPv6 addresses, as they should be assigned to exactly one device, whereas IPv4 addresses are commonly NATed, and therefore might actually have several clients behind them => they're probably a little more reluctant, when it comes to blocking those.

Edit: Nvm, I can fetch the site in a virtual machine using the exact same IPv6 address as my host machine, if I have been rate-limited on the host...

MattMoony avatar Apr 01 '23 19:04 MattMoony