Crawler-Detect icon indicating copy to clipboard operation
Crawler-Detect copied to clipboard

Potential bots

Open JayBizzle opened this issue 4 years ago • 8 comments

  • [x] Filestack
  • [x] Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
  • [x] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • [x] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • [x] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
  • [ ] Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1
  • [x] adreview/1.0
  • [x] Mozilla/5.0 (compatible; RyowlEngine/1.0; +https://ryowl.org)
  • [x] Mozilla/5.0 (compatible; RyowlEngine/1.0; +https://ryowl.com)
  • [x] Google-speakr
  • [x] Google-speakr,gzip(gfe)
  • [x] FeedViewer/1.0 (+http://www.feedviewer.net/webmasters; license agreement: http://www.feedviewer.net/license)
  • [x] acebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
  • [x] WhoAPI/1.0 (whoapi.com)
  • [x] Mozilla/5.0 (compatible; BackupLand/1.0; https://go.backupland.com/; Domain check for viruses;)
  • [x] Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) WhatCMS/1.0
  • [ ] Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
  • [ ] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • [ ] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • [ ] Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
  • [ ] Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) DownloaderChrome/62.0.3202.75 Safari/537.36
  • [x] iGooglePortal
  • [x] Mozilla/5.0+(compatible; Cula/2.0; https://cula.io/)
  • [ ] Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us; rv:1.9.2.3) Gecko/20100401 YFF35 Firefox/3.6.3
  • [x] Owlin - http://www.owlin.com
  • [x] Mozilla/5.0 (compatible; +centuryb.o.t9[at]gmail.com)
  • [ ] Bublup (+https://www.bublup.com/bublup.html)
  • [ ] Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36 | Hexometer.com - HexAct Inc.
  • [ ] Mozilla/5.0/Firefox/42.0 - nbertaupete95(at)gmail.com
  • [ ] OpenGraphCheck/2.1 (+https://opengraphcheck.com)
  • [ ] donwload_html/2.0 (Linux) [email protected]
  • [ ] LinuxGetURL/2.0 [email protected] (Linux)
  • [ ] Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Google-AMPHTML)
  • [ ] Google-AMPHTML
  • [ ] inactive-blog-skipper/1.0 ([email protected])
  • [ ] AWS Network Health / Contact [email protected] with your website URL to stop
  • [ ] AWS Network Health / Contact [email protected] with your website URL to stop
  • [ ] Corax - [email protected]
  • [ ] draw.io
  • [ ] MindsMediaProxy/3.0 (+http://www.minds.com/)
  • [ ] Mozilla/5.0 (w3dt header analysis for httprecon tools; http://w3dt.net/tools/httprecon)
  • [ ] Google-Test
  • [ ] Mozilla/5.0 (compatible; Google-Test;)
  • [ ] Mozilla/5.0 (compatible; RSiteAuditor)
  • [ ] Mozilla/5.0 (compatible; WPSec/1.3; +https://wpsec.com)
  • [ ] Mozilla/5.0 (compatible; Go-KI; +https://www.gosign.de/)
  • [ ] Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Google-AMPHTML)
  • [ ] Google-AMPHTML
  • [ ] Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome-prerendercloud/66.0.3359.139 Safari/537.36
  • [ ] DIGMATO.com web tester
  • [ ] Mozilla/5.0 (X11; Linux x86_64; Rigor) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36
  • [ ] Mozilla/5.0 Windows NT 10.0; Win64; x64 AppleWebKit/537.36 KHTML, like Gecko Chrome/65.0.3286.0 Safari/537.36 Rigor
  • [ ] Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0 (Research project: Visit PrivacyScore.org for details)
  • [ ] veu/1.0 (+http://www.veu.cat)
  • [ ] Google-Cloud-ML-Vision
  • [ ] FirmoGraph (+https://firmograph.io)
  • [ ] Mozilla/5.0 (compatible; 2GDPR/1.2; https://2gdpr.com)
  • [ ] CityGridMedia/1.0 (compatible; http://url-validation.citygrid.com/)
  • [ ] Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.11 (KHTML, like Gecko)(compatible; http://url-validation.citygrid.com/) Chrome/23.0.1271.95 Safari/537.11
  • [ ] https://gdnplus.com:Gather Analyze Provide.
  • [ ] northcutt.com SEO tools
  • [ ] Burf.co
  • [ ] Mozilla/5.0 (compatible; WPSec/1.3; +https://wpsec.com)
  • [ ] gensun.org

JayBizzle avatar Jan 21 '20 20:01 JayBizzle

Is this merged?

Abhirup-99 avatar Jun 05 '20 16:06 Abhirup-99

Is this merged?

The user-agents marked with ✅ have been added, the others need adding 👍🏻

JayBizzle avatar Jun 05 '20 20:06 JayBizzle

This is the UserAgent of the Google-Weblight bot:

  • [ ] Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19 Should be detectable by "googleweblight"

newHagen avatar Jul 01 '20 10:07 newHagen

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

  • Google-Ads-Creatives-Assistant
  • Google-Ads-Overview

with a simple "Google-Ads" detection?

clementmas avatar Jul 18 '23 01:07 clementmas

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

  • Google-Ads-Creatives-Assistant
  • Google-Ads-Overview

with a simple "Google-Ads" detection?

Yeah, go for it 👍

JayBizzle avatar Jul 21 '23 15:07 JayBizzle

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

SoranDK avatar Jul 24 '23 11:07 SoranDK

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

Yep, pretty annoying bots like this. Nothing this package can do about that 🤔

JayBizzle avatar Jul 26 '23 14:07 JayBizzle

I found this list if anyone's interested in going through it ;-P https://user-agents.net/bots

I don't have enough experience with regex to do it myself sadly... As my original post showed (hadn't noticed the bot I mentioned already would get catched by the "bot" in the regex).

SoranDK avatar Aug 17 '23 15:08 SoranDK