goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Suggestion: empty user-agent should be identified as crawler

Open fekir opened this issue 9 months ago • 4 comments

I use the option --no-crawler, and noticed that requests with empty user-agents are not marked as such

<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /404.php HTTP/1.1" 200 11729 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp.php HTTP/1.1" 200 11728 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /wp-head.php HTTP/1.1" 200 11733 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /images/uploader.php HTTP/1.1" 200 11743 "-" "-"
<ip address> - - [28/Feb/2025:17:13:33 +0100] "GET /upload/upload.php HTTP/1.1" 200 11741 "-" "-"

I'm not aware of any browser that does not use an user-agent, the most probable cause are some crawler that do not bother to set it. I think it would make sense to mark them by default as crawlers.

fekir avatar Mar 07 '25 15:03 fekir

That's a good point, have you tried --unknowns-as-crawlers to see if that helps?

allinurl avatar Mar 08 '25 01:03 allinurl

have you tried --unknowns-as-crawlers to see if that helps?

Just tested, it helps.

Is there any way to save the filtered logs to a file?

Something like

cat * | goaccess - --no-crawler --unknowns-as-crawlers --print-logs > file.txt

fekir avatar Mar 08 '25 06:03 fekir

Are you referring to the actual report values? You can export them as JSON or CSV.

cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.json 

or

cat * | goaccess - --no-crawler --unknowns-as-crawlers -o report.csv

allinurl avatar Mar 10 '25 22:03 allinurl

Are you referring to the actual report values? You can export them as JSON or CSV.

Yes, I was hoping I could tell goaccess to print them as is; without converting th to json or csv, so that I could (ab)use the filter functionality of goaccess, and then process the data with different tools; goaccess included.

fekir avatar Mar 11 '25 05:03 fekir