goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

Requested files vs new visitors a day, discrepency in hits and visitors

Open nadermx opened this issue 3 years ago • 8 comments

Hi, I am unsure if it is a bug, but when I see the panel of unquie visitors a day including spiders it shows around 1400 hits and 220 visitors. But if I look at requested URLS, it shows what I believe to be more accurate of almost the same number of visitors and hits. image So not sure since the domain ends in .to if there is a regex issue?

The command I use to make this report is cd /var/log/nginx && zcat access.log.*.gz | goaccess --4xx-to-unique-count access.log access.log.1 - -o report.html --log-format=COMBINED

Any clarification this would help

nadermx avatar Jun 27 '22 03:06 nadermx

Can I ask you, how does the chart look like without --4xx-to-unique-count?

allinurl avatar Jun 29 '22 01:06 allinurl

It seems about the same without400

nadermx avatar Jun 29 '22 18:06 nadermx

Hard to know exactly what's going on without having the logs. However, for me this looks like everyday you're getting an average of 223 unique visitors, but the first request (which seems to be / GET) has gotten a total of 1579 unique visitors over the range of parsed dates (not per day).

Let me know if that helps answer your question.

allinurl avatar Jun 29 '22 18:06 allinurl

I feel for some reason the vistors should be closer to 1,000 a day.

El mié, 29 de jun. de 2022 13:56, Gerardo O. @.***> escribió:

Hard to know exactly what's going on without having the logs. However, for me this looks like everyday you're getting an average of 223 unique visitors, but the first request (which seems to be / GET) has gotten a total of 1579 unique visitors over the range of parsed dates (not per day).

Let me know if that helps answer your question.

— Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/2343#issuecomment-1170373170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYACXHLT4XD7M2ZCG6NEXLVRSL4BANCNFSM5Z5BGWZQ . You are receiving this because you authored the thread.Message ID: @.***>

nadermx avatar Jun 29 '22 19:06 nadermx

To test this out, you could randomly grab multiple chunks of lines from the access log, e.g., 25, 50, 150 and run them against goaccess and see if the results match a manual inspection of those lines. Note that goaccess considers a unique visitor an HTTP request with the exact same IP, date, and user agent.

Please feel free to share those findings.

allinurl avatar Jun 29 '22 19:06 allinurl

I will do that, but I just am curiose why it then shows the total unquie visitors on the left vs the right, signifncantly different. One shows 22k hits and 3k visitors, but the graph on the right shows 20k hits 18k visitors.

nadermx avatar Jun 30 '22 03:06 nadermx

Great question! So the left-hand side panel (Visitors) counts unique visitors as an HTTP request with the exact same IP, user agent + date. However, the right-hand side panel and actually pretty much any other panel will consider a unique visitor an HTTP request with the exact same IP, date, user agent + the data field (column) on that panel.

For instance, a client can visit your website today Jun 30 2022, with IP 192.168.0.1 and Chrome v100 as user agent. That will be one visitor regardless how many times it reloads the page or navigates multiple pages on your site. However, every page or file the client visits, it will be one visitor for that page (file). So if it visits the home page (index.html), the Requested Files panel will show 1 unique visitor (regardless how many times it reloads that specific page), but if it visits the contact page (contact.html), then that will be another unique visit for a total of 2 unique visits on that panel but 1 on the Visitors panel. So technically, 1 unique visitor (Visitors panel), could visit 20 unique pages on the Requested Files panel.

As far as hits, they're pretty straightforward on all panels, it's a function of how many times it loads the data field. e.g., 30/Jun/2022 got 100 hits, index.html got 220 hits (over the range of parsed dates).

Let me know if that helps clarify your question.

allinurl avatar Jun 30 '22 15:06 allinurl

Any updates on this? Thanks

allinurl avatar Jul 13 '22 15:07 allinurl

Trying to understand the specs here.

I have been collecting logs for 16 days and now are trying to see whats the difference between what we observe in Google Analytics and what we can get with running GoAccess.

I have a 9,3GB log file from our edge with a wc -l on 13,856,118 which is the same as our Total requests in GoAccess.

GoAccess Total Vistors
Unique Visitors 443,173
Visitor Hostnames and IPS 102,937
Refering Sites 88,314
Google Analytics Active Users
28-Day Active Users 68,140

I would expect the numbers of users to be higher than what I observe in Google Analytics - but which number is the most accurate in comparison? I also would expect every user to have a IP hence expect that Visitor Hostnames and IPS should be equal to Unique Visitors.

jonasdk avatar Sep 07 '22 19:09 jonasdk

It seems it does add up.

nadermx avatar Sep 07 '22 20:09 nadermx

@jonasdk Are you filtering out bots? i.e., --ignore-crawlers. Also, since it looks like you are comparing data against Google, which doesn't support unique visit per minute, I'd run your goaccess instance with --date-spec=date or simply don't enable/pass this option if currently set.

It depends on what are you looking exactly. For instance, hostnames/ips are unique, meaning you had 102,937 unique IPs visiting your server on that specific range of time. That's an accurate number of hosts (people and/or bots) that visited your server. However, if a specific visitor/person visited on Sep/01, Sep/02 and then on Sep/05 using the same IP, it will show only as 1 unique hostname/ip. But the unique visitors count will show that as 3 visits. So again, it depends on how you are utilizing the data. If comparing it to Google, then unique visitors is what you are after.

allinurl avatar Sep 07 '22 23:09 allinurl

Thanks for the explanation that do make perfect sense. Not using the --date-spec=min made the result a lot more plausible. The report on crawlers (from the Browser stats) talks about some 18.28% of the visitors is that a common trend?

jonasdk avatar Sep 08 '22 07:09 jonasdk

@jonasdk it depends. Some sites will get hammered with bots and not many people. Others will be the opposite, and everything in between, 18% it's healthy in my opinion.

allinurl avatar Sep 09 '22 22:09 allinurl

Thanks for the follow up. Do you have a command I can run against the log files to test this? They are nginx.

On Wed, Jul 13, 2022 at 11:19 AM Gerardo O. @.***> wrote:

Any updates on this? Thanks

— Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/2343#issuecomment-1183358322, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYACXCXKNFRCBODXTTH2Q3VT3NBDANCNFSM5Z5BGWZQ . You are receiving this because you authored the thread.Message ID: @.***>

nadermx avatar Oct 11 '22 08:10 nadermx

@nadermx not sure I follow, are you looking to have minute specificity?

allinurl avatar Oct 30 '22 21:10 allinurl