goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Fix definitions of combined/common log formats

Open jjlin opened this issue 4 years ago • 3 comments

These formats have two initial fields to ignore (the RFC 1413 identity and the HTTP authenticated user).

jjlin avatar Jun 23 '20 03:06 jjlin

I'm actually pretty surprised a bug like this would remain in place for such a long time, so maybe I'm just missing something.

I'm using the combined log format as documented at https://httpd.apache.org/docs/2.4/logs.html#combined. I noticed that my transmit stats were way off; for example, downloading a multi-GB file wouldn't increase the Tx. Amount value at all. This change fixes that for me.

jjlin avatar Jun 23 '20 03:06 jjlin

There's no need for the extra %^. One ignore will look for the next character, so %h %^[ will skip everything until it finds [. Can you post your access log, I can take a look at what format you will need.

allinurl avatar Jun 28 '20 01:06 allinurl

Ah, I see. The description of %^ in https://goaccess.io/man#custom-log doesn't seem to mention what constitutes a "field", and there are various other predefined log formats in goaccess that use adjacent %^ specifiers. I think being explicit about the fact that there are actually two fields would make things clearer.

In any case, my format is the combined log format as I mentioned. Here's an example I pulled out (with the IP address modified).

1.2.3.4 - - [28/Jun/2020:14:40:14 -0700] "GET /centos/7/os/x86_64/repodata/repomd.xml HTTP/1.1" 200 3736 "-" "urlgrabber/3.10 yum/3.4.3"

I use these definitions in httpd.conf:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog "|bin/rotatelogs -f -n 30 -L logs/access_log logs/daily/access_log 86400" combined

(Note that LogFormat is the combined format documented at https://httpd.apache.org/docs/2.4/logs.html#combined.)

Does goaccess have any way to log entries that it thinks are malformed?

jjlin avatar Jun 28 '20 21:06 jjlin