Disambiguate http clients from crawlers/bots
I was surprised to find http clients like python-requests, Go-http-client, wget, curl, etc included in the crawler list. While I understand that these tools can be abused, in our case a large portion of our legitimate web traffic is from API requests using http clients like these.
For now I think I'll need to create an overriding allow list of patterns and remove matches from agents.Crawlers before processing, but it would be great to be able to disambiguate client tools/libraries based on a field in crawler-user-agents.json. Maybe just an is_client boolean, or a more generic tags string array which could contain client or similar? Any thoughts?
I'm sure I missed a few but looks like the list isn't too long
aiohttp
Apache-HttpClient
^curl
Go-http-client
http_get
httpx
libwww-perl
node-fetch
okhttp
python-requests
Python-urllib
[wW]get
Completely see your point. I like the idea of having optional tags:
"tags": ["generic-client"]
Would you do a pull-request? Thanks!