goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Ability to filter dataset by fields or regex

Open allinurl opened this issue 10 years ago • 48 comments

Add the ability to filter the results within the UI (Terminal & HTML) - e.g. filter by fields such as host, request, etc. then display only data matching that filter criteria, or enter a regex to match in the request and restrict display to only those matching entries.

Ideally this would spin up a new thread so multiple datasets can be analyzed at the same time. Each dataset should live on its own dashboard.

allinurl avatar May 21 '14 18:05 allinurl

Out of curiosity - is this functionality and those referenced / related intended for the TUI only?

aphorise avatar Oct 20 '15 23:10 aphorise

Good question, the original thought was to make these filters for the terminal, however, I didn't think much about having them available in the HTML output.

Not sure yet how this would work, perhaps allowing the user to set initial filters in the config file or since there are plans to have the HTML output be real-time, have some sort of subset filtering in the client side. Any thoughts?

allinurl avatar Oct 21 '15 00:10 allinurl

A small related note - I think most of the rapid requests that are coming in for additional functionality, aggregation and related UI - would be better grouped in a separate argument. For example:

--rui 'regex,average_files,average_hits,host_servers...'

for Rich-User-Interface. Thereafter and into the future it can be included as part of standard views if its common to most user expectations or perhaps adaptively enabled based on the log-file and the scheme therein that matches RUI options.

Regarding HTML - if you dont mind using jQuery & DataTables then for the specific purposes of sort / filter I'd recommend: http://datatables.net/examples/api/regex.html

Its a 160 Kbyte addition in javascript but worth it for what it does. This would also give us a footing into other light / efficient jQ based libraries for additional UI and eye-candy as required.

If however you do not wish to have such dependencies - then we have our work cut out :-D

aphorise avatar Oct 21 '15 10:10 aphorise

Is there a way to see today which IP visited which pages? I know this ticket might help achieve that in the future , but until this feature is added, is there a workaround to do that today? BTW, thanks heaps for this tool.

gitanupam avatar Aug 10 '17 05:08 gitanupam

@gitanupam while this is implemented, grep or any other filtering tool would be your friend here. e.g.,

For real-time filtering:

# tail -f -n +0 access.log | grep --line-buffered '192.168.3.1' | goaccess --log-format=COMBINED -

or for static filtering, simply

# grep '192.168.3.1' access.log | goaccess --log-format=COMBINED -

allinurl avatar Aug 10 '17 11:08 allinurl

It’ll be great to have this! It would allow one to dig into a single day (to check how traffic varies during that day).

rauschma avatar Feb 11 '18 10:02 rauschma

+1 for date/time range filtering

Dadibom avatar Feb 26 '18 13:02 Dadibom

+1

BirkhoffLee avatar Jul 13 '18 22:07 BirkhoffLee

+1

hiaselhans avatar Nov 25 '18 15:11 hiaselhans

not sure if this feature was deprioritized but would love to have this feature, seeing as this was opened 5 years ago I wanted to check if this is still being worked on?

Shagon94 avatar Apr 12 '19 08:04 Shagon94

@Shagon94 certainly still on top of the list. however, there's an outstanding issue with the on-disk storage that needs to be worked on before this. stay tuned though.

allinurl avatar Apr 13 '19 14:04 allinurl

FWIW I'd stick the data into sqlite3 table(s) and use sql to do the filtering. Better yet, add a mini-abstraction-layer so people with huge amount of logs could use a grown-up sql engine (postgres) for storage too.

dmaziuk avatar Aug 20 '19 15:08 dmaziuk

@dmaziuk agree on that. Stay tuned for the upcoming storage change.

allinurl avatar Aug 20 '19 15:08 allinurl

Any updates regarding this? Would be already good, if there is a possibility to toggle between daily, weekly and monthly stats. Going back to AWStats hurts.

nkvname avatar Sep 24 '19 12:09 nkvname

awstats? Luxury! I'm thinking wrapping our analog plus report magic setup in a docker container...

dmaziuk avatar Sep 24 '19 15:09 dmaziuk

@nkvname need to finish the on-disk storage replacement, this is second on the list though.

allinurl avatar Sep 24 '19 17:09 allinurl

This would be a fantastic enhancement! Is there an issue for the on-disk storage replacement, if so, can you please link it?

domo84 avatar Dec 14 '19 15:12 domo84

@domolicious I agree, this would be of great value! Issue #1274 Stay tuned!

allinurl avatar Dec 14 '19 15:12 allinurl

@allinurl awesome work so far. excited about the new release.

Will the new release help us to do the following ?

  • filter data in a table using data from other table (for eg. can we filter the list of requested urls using ip address or date/time

Right now, the information is there but the contextual information is missing.

It would be useful to have all the fields in a drop-down to filter all tables in different contexts.

insidesmart avatar May 13 '20 10:05 insidesmart

@insidesmart This issue will add that feature, it won't be part of the upcoming release (v1.4). I've addressed the storage issue mentioned before, so after deploying 1.4 I can focus on this issue.

allinurl avatar May 17 '20 15:05 allinurl

+1

manix avatar Jun 02 '20 09:06 manix

@allinurl When will the time range filter function been released?

We could make a donation or sponsor that if it would help.

pstranghoener avatar Sep 30 '20 10:09 pstranghoener

+1

I know the original intention was to create real-time tool, but adding filter support (GUI) would totally expand the possibilities for data analyses.

nagyalex avatar Sep 09 '21 18:09 nagyalex

I would like to see URLs being requested and the IP associated with each request.

mitchross avatar Nov 11 '21 22:11 mitchross

How this may affect existing datasets and charts used for incremental parsing?

If at some point of time we will start to use this feature, will be existing datasets updated somehow and what will be shown on existing charts prior using this feature?

Now we are run GoAccess using the Docker and without specific version - docker run allinurl/goaccess.

air3ijai avatar Dec 29 '21 07:12 air3ijai

@air3ijai This is being implemented so that it works against currently parsed logs. This feature won't be enabled if a dataset was persisted and the log doesn't exist anymore.

allinurl avatar Dec 29 '21 16:12 allinurl

@allinurl, you mean that if we will plan to enable such an option when it will be implemented it will require us to re-parse all the logs from scratch and if so, can we use it then with the next incremental parsing?

air3ijai avatar Dec 29 '21 18:12 air3ijai

Folks, I just wanted to give a positive and early update on this. I'm working on it as we speak. Early tests are already working as they should, though, still working on the details. I'm very excited of what it can achieve so far. Stay tuned!

allinurl avatar Jan 18 '22 23:01 allinurl

From all the comments it looks as if the design is still being decided. Here is a practical use case.

www.tcpdump.org has a download directory (/release/), which generates a signification fraction of the "tx amount" and hits/requests. There is also a man pages directory (/manpages/). It would be useful to be able to say: "let's look at the report without the downloads" or "let's look at the report for the downloads only" or "let's look at the report for the man pages only". Or, as mentioned earlier, to look at the IPv4/IPv6 subset only.

With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.

Feel free to use this input in your design if you wish.

infrastation avatar Feb 11 '22 11:02 infrastation

With the current implementation this could be done by filtering the access log with grep and generating a separate new report from stdin. As an idea, goaccess could allow to apply different filters post-generation in the viewer. From my point of view this would be useful enough and there would be no need to display multiple reports in the same viewer.

I wouldn't want to have any kind of post-generation. Anything that manipulates the logfile should be avoided. That way we're able to apply upcoming filters / interests as-we-go on raw logfiles - and even combine them to get more insights. That's what I love about GoAccess, it's very flexible.

CodeAlDente avatar Feb 11 '22 11:02 CodeAlDente