Suggestion: empty user-agent should be identified as crawler
I use the option --no-crawler, and noticed that requests with empty user-agents are not marked as such
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /404.php HTTP/1.1" 200 11729 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp.php HTTP/1.1" 200 11728 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp-head.php HTTP/1.1" 200 11733 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /images/uploader.php HTTP/1.1" 200 11743 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /upload/upload.php HTTP/1.1" 200 11741 "-" "-"
I'm not aware of any browser that does not use an user-agent, the most probable cause are some crawler that do not bother to set it. I think it would make sense to mark them by default as crawlers.
That's a good point, have you tried --unknowns-as-crawlers to see if that helps?
have you tried
--unknowns-as-crawlersto see if that helps?
Just tested, it helps.
Is there any way to save the filtered logs to a file?
Something like
cat * | goaccess - --no-crawler --unknowns-as-crawlers --print-logs > file.txt
Are you referring to the actual report values? You can export them as JSON or CSV.
cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.json
or
cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.csv
Are you referring to the actual report values? You can export them as JSON or CSV.
Yes, I was hoping I could tell goaccess to print them as is; without converting th to json or csv, so that I could (ab)use the filter functionality of goaccess, and then process the data with different tools; goaccess included.