goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

when using `--ignore-crawlers`, shouldn't "Operating Systems" section have no crawlers?

Open fekir opened this issue 4 weeks ago • 2 comments

I'm using --ignore-crawlers (together with --unknowns-as-crawlers) for removing bots from the report, but in the "Operating Systems" section, goaccess reports in the UI multiple unknown crawlers. There are no known crawlers.

I would have expected to see no crawler at all.

From the man page

       --ignore-crawlers
              Ignore crawlers from being counted.

       --unknowns-as-crawlers
              Classify unknown OS and browsers as crawlers.

goaccess version:

GoAccess - 1.9.3.
For more details visit: https://goaccess.io/
Copyright (C) 2009-2024 by Gerardo Orellana

Build configure arguments:
  --enable-utf8
  --enable-geoip=mmdb
  --with-openssl

did I misunderstand the documentation, or is this a bug?

Is there a way to find out which entries are still recognized as crawler but not filtered out? I could not find the information in the report generated with -o /tmp/report.csv

fekir avatar Dec 03 '25 09:12 fekir

Have you tried using --unknowns-log=<filename>? It should generate a list of all the ones it didn't recognize.

Could you please share a few lines from your access log where it's recognizing them but not filtering them out? Thanks

allinurl avatar Dec 04 '25 04:12 allinurl

Sorry for the late reply.

Some line from --unknowns-log=<filename>

[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[OS]   Mozilla/5.0 (compatible; Thinkbot/0.5.8;  In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)
[OS]   Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ShapBot/0.1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   ReaderDesktop/0.1.1996 macos 15.0.0 macos_x86_64
[BR]   ReaderDesktop/0.1.1996 macos 15.0.0 macos_x86_64
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Chrome Privacy Preserving Prefetch Proxy
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[BR]   Brightbot 1.0
[BR]   Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR]   Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR]   Aggregator/2.22.000 (Android/8.0.0; universal8890)
[BR]   Aggregator/2.22.000 (Android/8.0.0; universal8890)
[OS]   Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[BR]   Brightbot 1.0
[OS]   Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[BR]   Brightbot 1.0
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[OS]   Mozilla/4.0 (compatible; ms-office; MSOffice 16)
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Chrome Privacy Preserving Prefetch Proxy
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[OS]   Feedbin feed-id:3488499 - 1 subscribers
[BR]   Vienna/8414 (Macintosh; Intel macOS 15_7_0)
[OS]   Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ShapBot/0.1.0

the whole command I used is

cat * | goaccess - --with-mouse --log-format='%h %^ %^ [%d:%t %^] "%m %U %^" %s %b "%R" "%u"' --date-format='%d/%b/%Y' --time-format='%H:%M:%S' --unknowns-as-crawlers --jobs=10 --color-scheme=1 --ignore-panel=HOSTS --enable-panel=KEYPHRASES --ignore-crawlers --unknowns-as-crawlers --unknowns-log=/tmp/unknown.txt

fekir avatar Dec 06 '25 14:12 fekir