Crawler-Detect icon indicating copy to clipboard operation
Crawler-Detect copied to clipboard

33 new possibilities

Open rentalhost opened this issue 4 years ago • 9 comments

I have found some new possibilities, but I not right what should be added or not, or even if it is really a crawler. So I am creating this issue with checkboxes to you decide. I will make a PR after the decision.

Dangerous possibility (eg. cracker):

  • [ ] Mozlila/5.0 (Linux; Android 7.0; SM-G892A Bulid/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Moblie Safari/537.36
  • [ ] Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0
  • [ ] Chrome
  • [ ] "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"
  • [ ] "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"
  • [ ] "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
  • [ ] ''

The quoted user-agents are as was received by my logger.

High probability:

  • [x] webscraper
  • [x] PuppeteerAgent
  • [x] Microsoft.Data.Mashup (https://go.microsoft.com/fwlink/?LinkID=304225)
  • [ ] Microsoft Office/15.0 (Windows NT 6.1; MAPI 15.0.5172; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12026; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12130; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12228; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft PowerPoint 16.0.12026; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft PowerPoint 16.0.12130; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12026; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12130; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12228; Pro)
  • [x] Microsoft Office/16.0 (Windows NT 6.1; Microsoft Excel 16.0.12130; Pro)

Microsoft Office happen when you copy some content from web to it, then in some cases it will download content from original page. Same for Mashup.

Medium probability:

  • [x] rome/1.12.1

Can't found what it mean, but not seems to be a real browser.

Low probability:

  • [ ] Safari/12606.3.4.1.4 CFNetwork/811.11 Darwin/16.7.0 (x86_64)
  • [ ] Safari/13605.3.8 CFNetwork/902.1 Darwin/17.7.0 (x86_64)
  • [ ] Safari/13608.3.10.10.1 CFNetwork/902.4 Darwin/17.7.0 (x86_64)
  • [ ] Safari/15608.3.10.1.4 CFNetwork/1120 Darwin/19.0.0 (x86_64)
  • [ ] Outlook/15.0 (15.0.5125.1000; MSI; x64)
  • [ ] OC/16.0.12026.20334 (Skype for Business)
  • [ ] OC/16.0.12026.20344 (Skype for Business)
  • [ ] OC/16.0.12130.20272 (Skype for Business)
  • [ ] OC/16.0.12130.20344 (Skype for Business)
  • [ ] OC/16.0.12130.20390 (Skype for Business)
  • [ ] OC/16.0.12130.20410 (Skype for Business)
  • [ ] OC/16.0.12228.20332 (Skype for Business)

I don't know what mean CFNetwork, but is related to Apple. Outlook and OC is very similar to Microsoft Office case.

rentalhost avatar Dec 13 '19 02:12 rentalhost

Thanks for this. Go ahead and create a PR for the agents i have ticked. Will look into the other at a later date 👍

JayBizzle avatar Dec 13 '19 21:12 JayBizzle

Please also add "LieBaoFast" Chinese scrapers.

Take a look at this: https://www.johnlarge.co.uk/blocking-aggressive-chinese-crawlers-scrapers-bots/

mtshare avatar Jan 07 '20 17:01 mtshare

@mtshare would you like to submit a PR to add that bot?

JayBizzle avatar Jan 07 '20 21:01 JayBizzle

Is the corresponding pr merged?

Abhirup-99 avatar May 21 '20 10:05 Abhirup-99

Is the corresponding pr merged?

No PR was ever submitted 😔

JayBizzle avatar Jun 04 '20 20:06 JayBizzle

Sorry, I ended up waiting for the analysis of the other User Agents before sending the PR. Anyway, if someone can send me a PR, I appreciate it (I'm a little bit in trouble now).

rentalhost avatar Jun 04 '20 20:06 rentalhost

If no one is assigned, can I push a pr?

Abhirup-99 avatar Jun 04 '20 20:06 Abhirup-99

Should the take into consideration all the user agents assigned over here?

Abhirup-99 avatar Jun 04 '20 20:06 Abhirup-99

Just the ones that have been ticked 👍🏻

JayBizzle avatar Jun 04 '20 20:06 JayBizzle