goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Question: Merge/import html, json or csvs with an html report?

Open szorg opened this issue 8 years ago • 4 comments

Hello, thanks for taking a look. The question is: Is there a way to take an already processed report (in html, json, csv) and combine it with another report? Either one being generated or one that has already been completed. Not including on-disk dbs with TokyoCabinet, reasoning below.

an example might be:

$ goaccess -f full_log.01.20.2017.log.bz2  --combine-file=./full_log.01.19.2017.html --log-format=COMBINED -o ./full_log.01.19-20.2017.html

(process the 20th and add already aggregated data from the 19th to the final report)

The reasoning is: We have ~30 million lines per day, which on our current system takes 60-90 minutes using goaccess with TokyoCabinet, so if we wanted to generate a week's data we would be looking at 7-12 hours CPU give or take.

As for why TokyoCabinet isn't the best option - Ideally, we'd like to be able to do it on demand. Let's say we wanted to look at last Tuesday and Thursday together in one report, but it wasn't generated at the time. keeping the db data doesn't save us any work then.

szorg avatar Jan 21 '17 00:01 szorg

Interesting point. It's not possible right now to merge a report with another report. However, I do see the advantage of having this feature, especially for large data sets.

Though, I see a few things that seem to make this a bit challenging, for instance,

  1. How should goaccess handle reports that contain some metrics such as time served or bandwidth and reports without them?
  2. Should it be able to merge the same report, e.g., --combine-file=./full_log.01.19.2017.html --combine-file=./full_log.01.19.2017.html
  3. Should there be only one type of source format? e.g., JSON -> JSON/HTML/CSV/terminal?

allinurl avatar Jan 22 '17 01:01 allinurl

Thanks for your response! BTW I and my team do love this project, from what we've experienced of it so far.

My thoughts (from a sysadmin, non-developer, little programming experience perspective) on the challenges:

  1. Default to only include data that exists in both, but give an option to override. Perhaps default option could exist in the config file.
  2. There are two situations I can immediately see that being useful, listed below. I would say maybe default to adding values together but add options for something like FIFO/LIFO type of thing - first file gets all the values that they both have and only it has, second file gets only values that only it has, or the inverse. 2.1. If you have multiple systems running the same website and wanted to run reports on the servers locally and then combine them 2.2. If you have multiple sites and wanted to get an overall metric
  3. JSON would make sense, because if you have an HTML you can generate JSON. That would make JSON and HTML reports easily converted between the two.

szorg avatar Jan 23 '17 15:01 szorg

It should be really great if we can get this. I also searched for an solution to this actually and landed here :)

scysys avatar Jan 05 '22 16:01 scysys

I'm trying to parse logs from 3 different load-balancers over a year (~4TB each, so that's a looot of data) and it would really be great to have a way to combine reports, since the logs are a bit too big to have them all on the same machine at once, but then I have no way of displaying all the data in the same report.

AnomalRoil avatar Oct 19 '22 15:10 AnomalRoil