device-detector
device-detector copied to clipboard
Bot types
Variable for bots $categories has some ambiguous types:
Feed Fetcher,Feed Parser,Feed Reader- what's the difference, really?Read-it-later Serviceis used only for 2 items both for 1 thing: https://getpocket.com/pocketparser_ua. At the same time description on this page clearly sayscrawling, so should this not beCrawler?Search toolsis used only for 1 item: http://www.shopwiki.com/w/Help:Bot. Again, description clearly states, that this is a crawler, so should this not beCrawler?- How does
Security search botdiffer fromSecurity Checker? - How does
Service botdiffer fromService Agent? - And probably the biggest of it all: what's the difference of
Search botfromCrawler? I mean, crawling is done by search bots, so this seems to be the same thing.
I am fine with creating PR to harmonize these things a bit, but I think this warrants a proper discussion first.
https://github.com/matomo-org/device-detector/issues/5727
Hm, that one did not cover the questions above, in the end, although it did mention multiple feed bots, and it resulted in code for validating categories. I am, essentially, talking about cleaning up the types.
I guess we don't have a "clean" definition of categories to use. Feel free to create a PR to clean them up a bit.
I can add this to #7490. Or would a separate PR be better?
@Simbiat It's better to have a separate PR, as that makes reviewing easier.
I've come across https://radar.cloudflare.com/traffic/verified-bots, which has a nice classification. Thoughts?
What that page suggests:
Academic Research- used only for Internet Archive, and I am not sure it's correct category. To me it would probably be a regularCrawlerAccessibility- 3 entries, does make sense for those bots. Probably a valid category, which we can adopt.Advertising & Marketing- based on my knowledge of how these bots work and what they do (which limited to my short time in Smartly.io) I'd say these could be treated similar toMonitoring & Analyticscategory below.Aggregator- Again, looks like a regularCrawlerto me, not sure worth it to have this as separate category.AI Crawler- probably a valid category nowadays, although only 3 entries there. On the other hand "AI" will only imply technology used by the company, not necessarily the purpose of the bot, so regularCrawlercould still be fineFeed Fetcher- same that what we have in 3 categoriesMonitoring & Analytics- looks similar to ourSite MonitorOther- has 2 items which could be considered asWebhooks(category below)Page Preview- essentially search bots, and some app-specific onesSearch Engine Crawler- same as ourSearch botSearch Engine Optimization- same as ourSearch toolsor maybeSite Monitorin some casesSecurity- same as ourSecurity CheckerandSecurity search botSocial Media Marketing- just Brandwatch in the list, which I would consider a regular crawlerWebhooks- this feels a bit generic. I would even say that somePage reviewitems could be consideredWebhooksas well.
Personally this is what I would do:
- Add
Assistantcategory, update the bots from CloudFlare'sAccessibilitybots Benchmark-> move toInspectorCrawler-> keep as isFeed Fetcher-> rename toAggregatorFeed Parser-> move toAggregatorFeed Reader-> move toAggregatorNetwork Monitor-> move toInspectorRead-it-later Service-> move toCrawlerSearch bot-> rename toSearcherSearch tools-> move toCrawlerSecurity Checker-> move toInspectorSecurity search bot-> move toInspectorService Agent-> some can be moved toInspector, some toCrawler, from a quick glanceService bot-> I'd sayGrammarlyprobably can be treated asAssistant,Vercel- asInspector,ADmantXprobably, tooSite Monitor-> move toInspectorSocial Media Agent-> mostly image fetchers, essentially, so eitherSearcherorCrawlerValidator-> move toInspector
So this would leave these categories:
Supporter- bots used by various assistive technologies, including, but not limited to text-to-voice, voice-to-text, image-to-text services, translators and editorial tools.Aggregator- bots used by tools aimed at collection and potential summarization of information from pages, including but not limited to feed readers, link or page collectors and summarization tools.Crawler- bots not falling under other categories or related to generic or multi-purpose services.Inspector- bots used by various tools and services aimed at monitoring, inspecting, validating and/or analyzing content or behavior of websites and users' interactions with them, including for security and/or SEO purposes.Searcher- bots used for services related to search, including, but not limited to search engines and social networks.
I also tried thinking of some acronym, but best I and GPT came up with was SCAIS, because it can be pronounced "skies". Not like we need an acronym or need these specific names, of course. But I think they are a good balance between precise and generic.
Any update would require review of all the bots. I do hope, that by the end of year I will finish going through all brands (and submit PR to correct quite a few things there) and start working on bots, and when I do I can adjust their categories as well, of course.