goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Can unknown traffic be excluded from the report?

Open cdrx opened this issue 3 years ago • 3 comments
trafficstars

I'm analysing some web traffic, but trying to limit the report to just traffic that is likely genuine user activity (i.e. not a bot, sensible looking user agent, etc).

Using --ignore-crawlers gets me most of the way there, which is great.

If I run with --unknowns-log, I can see from the file that there is a lot of long tail junk activity I'm not interested in (log4j attacks, curl, weird bots etc).

Is it possible to skip / filter out all "unknown" traffic?

cdrx avatar Jan 08 '22 13:01 cdrx

Thanks for suggesting this. There's no option now to ignore those. Are you looking to ignore them from being counted completely or simply not showing that data?

allinurl avatar Jan 09 '22 23:01 allinurl

I'm not sure I understand the difference between not counting or not showing the data.

For me, ideally, I would want the unknowns to be either not imported at all, or excluded from the "visits" metric, on the "unique visitors per day" panel.

I guess what I'm looking for is "unique likely-human visitors per day" (as best we can tell, from the logs)

cdrx avatar Jan 10 '22 09:01 cdrx

Hum... it is seem so complicated.

In this same way that this request for you may seem "unknown", those may be incorrectly labeled. Yeah... I known... A lot of web traffic is just trash. But I would take care about this.

Some times, I had DDoS attack or a extraordinary bandwidth consumption from this "unknown" sources. I advise you to also be aware of this traffic, so as not to have any unpleasant surprises. Or keep another separate report for that.

0bi-w6n-K3nobi avatar Jan 11 '22 14:01 0bi-w6n-K3nobi