goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Add option to exclude request from report

Open wavexx opened this issue 7 years ago • 17 comments

As for exclude-ip, it's often convenient to exclude some known path roots from the statistics. For instance, I'd like to exclude requests for /favicon.ico or known protected areas (say, /admin/).

wavexx avatar Mar 21 '17 11:03 wavexx

I could add an --exclude-request option but I'm not sure if it's necessary since you could actually do some pre-processing such as:

cat access.log | grep -v -f exclude_list.txt | goaccess -

or

cat access.log | grep -Ev '/favicon.ico|/admin/' | goaccess -

Does that address the use case you mentioned above?

allinurl avatar Mar 21 '17 19:03 allinurl

On Tue, Mar 21 2017, Gerardo O. wrote:

I could add an --exclude-request option but I'm not sure if it's necessary since you could actually do some pre-processing such as:

cat access.log | grep -v -f exclude_list.txt | goaccess -

or

cat access.log | grep -Ev '/favicon.ico|/admin/' | goaccess -

Does that address the use case you mentioned above?

I already do that, but it's a hack without parsing the actual log file. You need to anchor the paths to the request field to be somewhat more precise, such as '(GET|POST|HEAD) /path..', but it would be much more polished to do it after parsing.

wavexx avatar Mar 22 '17 11:03 wavexx

Got it. I can certainly add this option. Please keep this open so I can look into it. Thanks!

allinurl avatar Mar 22 '17 22:03 allinurl

I am interested by this too.

kaworu avatar Jul 20 '17 07:07 kaworu

I agreed, the requests for static files (images, javascripts css and so on) should be excluded from report, at least in the "REQUESTED FILES (URLS)" panel, otherwise it's really difficult to analyse the result. It would be great to have the "--exclude-request-list" option followed by an exclude_list file

lvbeck avatar Feb 14 '18 21:02 lvbeck

+1 it would be really handy if static files could be exuded without having to play manually with grep.

Jonuz avatar Mar 05 '18 21:03 Jonuz

Any news on this? The "--static-file" flag is a bit pointless for me, as I mostly run goaccess on entirely staticly-generated websites. But even on dynamically-generated websites, I still want to factor-in the size of the static files anyway.

The files I'd like to ignore are identified by paths, not just extensions.

Just to give an idea, I'd like to exclude requests for the prefix "/phpmyadmin". These are bots that generate hundreds of useless requests that show up and pullute the "failed requests" section, making it useless for traffic analysis.

wavexx avatar Jun 26 '18 10:06 wavexx

@allinurl, thanks,

cat access.log | grep -v -f exclude_list.txt | goaccess -

works fine to generate a static report, but doesn't work well with --real-time-html. I agree that it would be great to have --exclude-request-list.

pozitron57 avatar Jul 15 '18 18:07 pozitron57

@pozitron57 have you tried using --line-buffered with grep? e.g.,

tail -f -n +0 access.log | grep --line-buffered -v -f exclude_list.txt | goaccess -

allinurl avatar Jul 15 '18 22:07 allinurl

@allinurl thanks, that works!

pozitron57 avatar Jul 16 '18 11:07 pozitron57

I already do that, but it's a hack without parsing the actual log file. You need to anchor the paths to the request field to be somewhat more precise, such as '(GET|POST|HEAD) /path..', but it would be much more polished to do it after parsing.

Here's a way that should catch anything from GET to DELETE for a precise path:

cat access.log | grep -Ev ' "[A-Z]{3,6} \/path' | goaccess -

groovenectar avatar Oct 02 '18 19:10 groovenectar

Except when "YO /path" is in the user agent. Please don't suggest grep again, or anything regex related. My request is to have proper prefix matching on the path, within goaccess.

wavexx avatar Oct 02 '18 22:10 wavexx

What is a YO HTTP method? Edit: I see what you're saying about that now, like if someone has a malformed UA.. still easy to remedy if we really need to dig that deep, which is what it seems we're expecting the author to do in this thread..

The author would be having to figure out an approach to implement this entirely new functionality for you.... Possibly with regex......

I'm really glad that regex was suggested as some kind of solution, so I wanted to contribute what I did with it today... I came across it only in this thread. Sorry to offend

groovenectar avatar Oct 02 '18 23:10 groovenectar

Besides being buggy, none of the regex solutions here work after goaccess has parsed the logs already. If I have a year's worth of data in a big on-disk database and I realize that it would be better if I filtered out some path, I can't unless I've also saved the original request logs.

johntyree avatar Dec 01 '19 07:12 johntyree

I too would love an option to exclude certain paths from the results. For instance, being able to exclude stuff like /admin would be great.

michieloosterling avatar Jun 24 '20 16:06 michieloosterling

@pozitron57 have you tried using --line-buffered with grep? e.g.,

tail -f -n +0 access.log | grep --line-buffered -v -f exclude_list.txt | goaccess -

How would this be used in a service, or with --daemon I may be missing something.

CryDeTaan avatar Aug 07 '20 16:08 CryDeTaan

Just an FYI. This request will be addressed by #117. Working on it as we speak — stay tuned!

allinurl avatar Jul 12 '22 22:07 allinurl