when using `--ignore-crawlers`, shouldn't "Operating Systems" section have no crawlers?
I'm using --ignore-crawlers (together with --unknowns-as-crawlers) for removing bots from the report, but in the "Operating Systems" section, goaccess reports in the UI multiple unknown crawlers.
There are no known crawlers.
I would have expected to see no crawler at all.
From the man page
--ignore-crawlers
Ignore crawlers from being counted.
--unknowns-as-crawlers
Classify unknown OS and browsers as crawlers.
goaccess version:
GoAccess - 1.9.3.
For more details visit: https://goaccess.io/
Copyright (C) 2009-2024 by Gerardo Orellana
Build configure arguments:
--enable-utf8
--enable-geoip=mmdb
--with-openssl
did I misunderstand the documentation, or is this a bug?
Is there a way to find out which entries are still recognized as crawler but not filtered out?
I could not find the information in the report generated with -o /tmp/report.csv
Have you tried using --unknowns-log=<filename>? It should generate a list of all the ones it didn't recognize.
Could you please share a few lines from your access log where it's recognizing them but not filtering them out? Thanks
Sorry for the late reply.
Some line from --unknowns-log=<filename>
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[OS] Mozilla/5.0 (compatible; Thinkbot/0.5.8; In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)
[OS] Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ShapBot/0.1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] ReaderDesktop/0.1.1996 macos 15.0.0 macos_x86_64
[BR] ReaderDesktop/0.1.1996 macos 15.0.0 macos_x86_64
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Chrome Privacy Preserving Prefetch Proxy
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[BR] Brightbot 1.0
[BR] Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR] Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR] Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR] Aggregator/2.22.000 (Android/8.0.0; universal8890)
[OS] Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[BR] Brightbot 1.0
[OS] Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[BR] Brightbot 1.0
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[OS] Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Chrome Privacy Preserving Prefetch Proxy
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[OS] Feedbin feed-id:3488499 - 1 subscribers
[BR] Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[OS] Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ShapBot/0.1.0
the whole command I used is
cat * | goaccess - --with-mouse --log-format='%h %^ %^ [%d:%t %^] "%m %U %^" %s %b "%R" "%u"' --date-format='%d/%b/%Y' --time-format='%H:%M:%S' --unknowns-as-crawlers --jobs=10 --color-scheme=1 --ignore-panel=HOSTS --enable-panel=KEYPHRASES --ignore-crawlers --unknowns-as-crawlers --unknowns-log=/tmp/unknown.txt