device-detector
device-detector copied to clipboard
Bot types
Variable for bots $categories
has some ambiguous types:
-
Feed Fetcher
,Feed Parser
,Feed Reader
- what's the difference, really? -
Read-it-later Service
is used only for 2 items both for 1 thing: https://getpocket.com/pocketparser_ua. At the same time description on this page clearly sayscrawling
, so should this not beCrawler
? -
Search tools
is used only for 1 item: http://www.shopwiki.com/w/Help:Bot. Again, description clearly states, that this is a crawler, so should this not beCrawler
? - How does
Security search bot
differ fromSecurity Checker
? - How does
Service bot
differ fromService Agent
? - And probably the biggest of it all: what's the difference of
Search bot
fromCrawler
? I mean, crawling is done by search bots, so this seems to be the same thing.
I am fine with creating PR to harmonize these things a bit, but I think this warrants a proper discussion first.
https://github.com/matomo-org/device-detector/issues/5727
Hm, that one did not cover the questions above, in the end, although it did mention multiple feed bots, and it resulted in code for validating categories. I am, essentially, talking about cleaning up the types.
I guess we don't have a "clean" definition of categories to use. Feel free to create a PR to clean them up a bit.
I can add this to #7490. Or would a separate PR be better?
@Simbiat It's better to have a separate PR, as that makes reviewing easier.
I've come across https://radar.cloudflare.com/traffic/verified-bots, which has a nice classification. Thoughts?
What that page suggests:
-
Academic Research
- used only for Internet Archive, and I am not sure it's correct category. To me it would probably be a regularCrawler
-
Accessibility
- 3 entries, does make sense for those bots. Probably a valid category, which we can adopt. -
Advertising & Marketing
- based on my knowledge of how these bots work and what they do (which limited to my short time in Smartly.io) I'd say these could be treated similar toMonitoring & Analytics
category below. -
Aggregator
- Again, looks like a regularCrawler
to me, not sure worth it to have this as separate category. -
AI Crawler
- probably a valid category nowadays, although only 3 entries there. On the other hand "AI" will only imply technology used by the company, not necessarily the purpose of the bot, so regularCrawler
could still be fine -
Feed Fetcher
- same that what we have in 3 categories -
Monitoring & Analytics
- looks similar to ourSite Monitor
-
Other
- has 2 items which could be considered asWebhooks
(category below) -
Page Preview
- essentially search bots, and some app-specific ones -
Search Engine Crawler
- same as ourSearch bot
-
Search Engine Optimization
- same as ourSearch tools
or maybeSite Monitor
in some cases -
Security
- same as ourSecurity Checker
andSecurity search bot
-
Social Media Marketing
- just Brandwatch in the list, which I would consider a regular crawler -
Webhooks
- this feels a bit generic. I would even say that somePage review
items could be consideredWebhooks
as well.
Personally this is what I would do:
- Add
Assistant
category, update the bots from CloudFlare'sAccessibility
bots -
Benchmark
-> move toInspector
-
Crawler
-> keep as is -
Feed Fetcher
-> rename toAggregator
-
Feed Parser
-> move toAggregator
-
Feed Reader
-> move toAggregator
-
Network Monitor
-> move toInspector
-
Read-it-later Service
-> move toCrawler
-
Search bot
-> rename toSearcher
-
Search tools
-> move toCrawler
-
Security Checker
-> move toInspector
-
Security search bot
-> move toInspector
-
Service Agent
-> some can be moved toInspector
, some toCrawler
, from a quick glance -
Service bot
-> I'd sayGrammarly
probably can be treated asAssistant
,Vercel
- asInspector
,ADmantX
probably, too -
Site Monitor
-> move toInspector
-
Social Media Agent
-> mostly image fetchers, essentially, so eitherSearcher
orCrawler
-
Validator
-> move toInspector
So this would leave these categories:
-
Supporter
- bots used by various assistive technologies, including, but not limited to text-to-voice, voice-to-text, image-to-text services, translators and editorial tools. -
Aggregator
- bots used by tools aimed at collection and potential summarization of information from pages, including but not limited to feed readers, link or page collectors and summarization tools. -
Crawler
- bots not falling under other categories or related to generic or multi-purpose services. -
Inspector
- bots used by various tools and services aimed at monitoring, inspecting, validating and/or analyzing content or behavior of websites and users' interactions with them, including for security and/or SEO purposes. -
Searcher
- bots used for services related to search, including, but not limited to search engines and social networks.
I also tried thinking of some acronym, but best I and GPT came up with was SCAIS, because it can be pronounced "skies". Not like we need an acronym or need these specific names, of course. But I think they are a good balance between precise and generic.
Any update would require review of all the bots. I do hope, that by the end of year I will finish going through all brands (and submit PR to correct quite a few things there) and start working on bots, and when I do I can adjust their categories as well, of course.