goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Duplicate counts in real-time HTML report monitoring 2 web servers with lsynced log files

Open naris opened this issue 3 months ago • 5 comments

I am using goaccess to monitor 2 load balanced Apache web servers. I use lsync to copy the logs from the 2 servers to a 3rd server that hosts the goaccess real time stats page. Unfortunately, the counts are continuously incrementing by large amounts, a lot more than actual traffic.

The log files are standard apache access_log files, so access_log and a weeks worth of access_log-yyyymmdd log files. Since there is no way to configure goaccess to read log files that do not have a static name, such as those with datestamps, I have had to resort to using a script that is: cat /local/home/icomadm/logs/pc1uicomweb11/access_log-* /local/home/icomadm/logs/pc1uicomweb12/access_log-* | /usr/local/bin/goaccess -p /local/home/icomadm/conf/goaccess-prd.conf -l /local/home/icomadm/logs/goaccess-prd.log -o /local/home/icomadm/www/webstats.html -

I have also tried using tail: tail -F /local/home/icomadm/logs/pc1uicomweb11/access_log /local/home/icomadm/logs/pc1uicomweb12/access_log | /usr/local/bin/goaccess -p /local/home/icomadm/conf/goaccess-prd.conf -l /local/home/icomadm/logs/goaccess-prd.log -o /local/home/icomadm/www/webstats.html -

and specifying (or not specifying) access_log in the config file: log-file /local/home/icomadm/logs/pc1uicomweb11/access_log log-file /local/home/icomadm/logs/pc1uicomweb12/access_log

No matter what I do, the counts just keep rapidly increasing, obviously duplicating data :(

I also have had to disable persist and restore as that was really duplicating records, even when there is no traffic at all.

naris avatar Sep 12 '25 15:09 naris

This appears to be a setup issue rather than something related to GoAccess. What's happening:

cat access_log-* | goaccess ... every time you run that, you re-ingest an entire week of history. GoAccess counts what it's given.

tail -F ... | goaccess ... + lsync/rsync many file sync setups replace the destination file (new inode) or resend the whole file. When that happens, tail -F treats it like a new file and starts reading from the beginning, so GoAccess sees the whole file again and re-counts.

I'd run GoAccess as a long-lived process and feed only new lines or keep using lsync/rsync, but make the destination append-only.

allinurl avatar Sep 12 '25 15:09 allinurl

When I tried using tail, I did NOT include the older log files, only the 2 access_log files (and commented out the log-file settings on the conf file) and it still kept duplicating counts.

naris avatar Sep 12 '25 15:09 naris

Would you be able to share those logs? I'd like to replicate the issue and take a closer look...

Just to clarify, GoAccess doesn't automatically deduplicate entries, so if the same log line appears more than once (which is common in access logs), it will be counted multiple times.

allinurl avatar Sep 12 '25 16:09 allinurl

Sure, here are just the access_log files (all the log files are too big) logs.zip

and the config file and script to run it with tail conf.zip

Running it with tail and changing lsync to include --append-verify in the rsync parms seems to be better, although now I don't have history.

Do you think it would be safe to turn persist and restore back on?

naris avatar Sep 12 '25 16:09 naris

Looks like, no - turning on persist & restore duplicates everything so I turned it back off.

naris avatar Sep 12 '25 18:09 naris