goaccess
goaccess copied to clipboard
Requested files vs new visitors a day, discrepency in hits and visitors
Hi, I am unsure if it is a bug, but when I see the panel of unquie visitors a day including spiders it shows around 1400 hits and 220 visitors. But if I look at requested URLS, it shows what I believe to be more accurate of almost the same number of visitors and hits.
So not sure since the domain ends in .to if there is a regex issue?
The command I use to make this report is cd /var/log/nginx && zcat access.log.*.gz | goaccess --4xx-to-unique-count access.log access.log.1 - -o report.html --log-format=COMBINED
Any clarification this would help
Can I ask you, how does the chart look like without --4xx-to-unique-count?
It seems about the same

Hard to know exactly what's going on without having the logs. However, for me this looks like everyday you're getting an average of 223 unique visitors, but the first request (which seems to be / GET) has gotten a total of 1579 unique visitors over the range of parsed dates (not per day).
Let me know if that helps answer your question.
I feel for some reason the vistors should be closer to 1,000 a day.
El mié, 29 de jun. de 2022 13:56, Gerardo O. @.***> escribió:
Hard to know exactly what's going on without having the logs. However, for me this looks like everyday you're getting an average of 223 unique visitors, but the first request (which seems to be / GET) has gotten a total of 1579 unique visitors over the range of parsed dates (not per day).
Let me know if that helps answer your question.
— Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/2343#issuecomment-1170373170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYACXHLT4XD7M2ZCG6NEXLVRSL4BANCNFSM5Z5BGWZQ . You are receiving this because you authored the thread.Message ID: @.***>
To test this out, you could randomly grab multiple chunks of lines from the access log, e.g., 25, 50, 150 and run them against goaccess and see if the results match a manual inspection of those lines. Note that goaccess considers a unique visitor an HTTP request with the exact same IP, date, and user agent.
Please feel free to share those findings.
I will do that, but I just am curiose why it then shows the total unquie visitors on the left vs the right, signifncantly different. One shows 22k hits and 3k visitors, but the graph on the right shows 20k hits 18k visitors.
Great question! So the left-hand side panel (Visitors) counts unique visitors as an HTTP request with the exact same IP, user agent + date. However, the right-hand side panel and actually pretty much any other panel will consider a unique visitor an HTTP request with the exact same IP, date, user agent + the data field (column) on that panel.
For instance, a client can visit your website today Jun 30 2022, with IP 192.168.0.1 and Chrome v100 as user agent. That will be one visitor regardless how many times it reloads the page or navigates multiple pages on your site. However, every page or file the client visits, it will be one visitor for that page (file). So if it visits the home page (index.html), the Requested Files panel will show 1 unique visitor (regardless how many times it reloads that specific page), but if it visits the contact page (contact.html), then that will be another unique visit for a total of 2 unique visits on that panel but 1 on the Visitors panel. So technically, 1 unique visitor (Visitors panel), could visit 20 unique pages on the Requested Files panel.
As far as hits, they're pretty straightforward on all panels, it's a function of how many times it loads the data field. e.g., 30/Jun/2022 got 100 hits, index.html got 220 hits (over the range of parsed dates).
Let me know if that helps clarify your question.
Any updates on this? Thanks
Trying to understand the specs here.
I have been collecting logs for 16 days and now are trying to see whats the difference between what we observe in Google Analytics and what we can get with running GoAccess.
I have a 9,3GB log file from our edge with a wc -l on 13,856,118 which is the same as our Total requests in GoAccess.
| GoAccess | Total Vistors |
|---|---|
| Unique Visitors | 443,173 |
| Visitor Hostnames and IPS | 102,937 |
| Refering Sites | 88,314 |
| Google Analytics | Active Users |
|---|---|
| 28-Day Active Users | 68,140 |
I would expect the numbers of users to be higher than what I observe in Google Analytics - but which number is the most accurate in comparison? I also would expect every user to have a IP hence expect that Visitor Hostnames and IPS should be equal to Unique Visitors.
It seems it does add up.
@jonasdk Are you filtering out bots? i.e., --ignore-crawlers. Also, since it looks like you are comparing data against Google, which doesn't support unique visit per minute, I'd run your goaccess instance with --date-spec=date or simply don't enable/pass this option if currently set.
It depends on what are you looking exactly. For instance, hostnames/ips are unique, meaning you had 102,937 unique IPs visiting your server on that specific range of time. That's an accurate number of hosts (people and/or bots) that visited your server. However, if a specific visitor/person visited on Sep/01, Sep/02 and then on Sep/05 using the same IP, it will show only as 1 unique hostname/ip. But the unique visitors count will show that as 3 visits. So again, it depends on how you are utilizing the data. If comparing it to Google, then unique visitors is what you are after.
Thanks for the explanation that do make perfect sense. Not using the --date-spec=min made the result a lot more plausible.
The report on crawlers (from the Browser stats) talks about some 18.28% of the visitors is that a common trend?
@jonasdk it depends. Some sites will get hammered with bots and not many people. Others will be the opposite, and everything in between, 18% it's healthy in my opinion.
Thanks for the follow up. Do you have a command I can run against the log files to test this? They are nginx.
On Wed, Jul 13, 2022 at 11:19 AM Gerardo O. @.***> wrote:
Any updates on this? Thanks
— Reply to this email directly, view it on GitHub https://github.com/allinurl/goaccess/issues/2343#issuecomment-1183358322, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYACXCXKNFRCBODXTTH2Q3VT3NBDANCNFSM5Z5BGWZQ . You are receiving this because you authored the thread.Message ID: @.***>
@nadermx not sure I follow, are you looking to have minute specificity?