Crawler-Detect
Crawler-Detect copied to clipboard
33 new possibilities
I have found some new possibilities, but I not right what should be added or not, or even if it is really a crawler. So I am creating this issue with checkboxes to you decide. I will make a PR after the decision.
Dangerous possibility (eg. cracker):
- [ ] Mozlila/5.0 (Linux; Android 7.0; SM-G892A Bulid/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Moblie Safari/537.36
- [ ] Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0
- [ ] Chrome
- [ ] "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)"
- [ ] "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"
- [ ] "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
- [ ] ''
The quoted user-agents are as was received by my logger.
High probability:
- [x] webscraper
- [x] PuppeteerAgent
- [x] Microsoft.Data.Mashup (https://go.microsoft.com/fwlink/?LinkID=304225)
- [ ] Microsoft Office/15.0 (Windows NT 6.1; MAPI 15.0.5172; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12026; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12130; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Excel 16.0.12228; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft PowerPoint 16.0.12026; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft PowerPoint 16.0.12130; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12026; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12130; Pro)
- [x] Microsoft Office/16.0 (Windows NT 10.0; Microsoft Word 16.0.12228; Pro)
- [x] Microsoft Office/16.0 (Windows NT 6.1; Microsoft Excel 16.0.12130; Pro)
Microsoft Office happen when you copy some content from web to it, then in some cases it will download content from original page. Same for Mashup.
Medium probability:
- [x] rome/1.12.1
Can't found what it mean, but not seems to be a real browser.
Low probability:
- [ ] Safari/12606.3.4.1.4 CFNetwork/811.11 Darwin/16.7.0 (x86_64)
- [ ] Safari/13605.3.8 CFNetwork/902.1 Darwin/17.7.0 (x86_64)
- [ ] Safari/13608.3.10.10.1 CFNetwork/902.4 Darwin/17.7.0 (x86_64)
- [ ] Safari/15608.3.10.1.4 CFNetwork/1120 Darwin/19.0.0 (x86_64)
- [ ] Outlook/15.0 (15.0.5125.1000; MSI; x64)
- [ ] OC/16.0.12026.20334 (Skype for Business)
- [ ] OC/16.0.12026.20344 (Skype for Business)
- [ ] OC/16.0.12130.20272 (Skype for Business)
- [ ] OC/16.0.12130.20344 (Skype for Business)
- [ ] OC/16.0.12130.20390 (Skype for Business)
- [ ] OC/16.0.12130.20410 (Skype for Business)
- [ ] OC/16.0.12228.20332 (Skype for Business)
I don't know what mean CFNetwork, but is related to Apple. Outlook and OC is very similar to Microsoft Office case.
Thanks for this. Go ahead and create a PR for the agents i have ticked. Will look into the other at a later date 👍
Please also add "LieBaoFast" Chinese scrapers.
Take a look at this: https://www.johnlarge.co.uk/blocking-aggressive-chinese-crawlers-scrapers-bots/
@mtshare would you like to submit a PR to add that bot?
Is the corresponding pr merged?
Is the corresponding pr merged?
No PR was ever submitted 😔
Sorry, I ended up waiting for the analysis of the other User Agents before sending the PR. Anyway, if someone can send me a PR, I appreciate it (I'm a little bit in trouble now).
If no one is assigned, can I push a pr?
Should the take into consideration all the user agents assigned over here?
Just the ones that have been ticked 👍🏻