goaccess
goaccess copied to clipboard
Fix definitions of combined/common log formats
These formats have two initial fields to ignore (the RFC 1413 identity and the HTTP authenticated user).
I'm actually pretty surprised a bug like this would remain in place for such a long time, so maybe I'm just missing something.
I'm using the combined log format as documented at https://httpd.apache.org/docs/2.4/logs.html#combined. I noticed that my transmit stats were way off; for example, downloading a multi-GB file wouldn't increase the Tx. Amount
value at all. This change fixes that for me.
There's no need for the extra %^
. One ignore will look for the next character, so %h %^[
will skip everything until it finds [
. Can you post your access log, I can take a look at what format you will need.
Ah, I see. The description of %^
in https://goaccess.io/man#custom-log doesn't seem to mention what constitutes a "field", and there are various other predefined log formats in goaccess that use adjacent %^
specifiers. I think being explicit about the fact that there are actually two fields would make things clearer.
In any case, my format is the combined log format as I mentioned. Here's an example I pulled out (with the IP address modified).
1.2.3.4 - - [28/Jun/2020:14:40:14 -0700] "GET /centos/7/os/x86_64/repodata/repomd.xml HTTP/1.1" 200 3736 "-" "urlgrabber/3.10 yum/3.4.3"
I use these definitions in httpd.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog "|bin/rotatelogs -f -n 30 -L logs/access_log logs/daily/access_log 86400" combined
(Note that LogFormat
is the combined
format documented at https://httpd.apache.org/docs/2.4/logs.html#combined.)
Does goaccess have any way to log entries that it thinks are malformed?