Apachetop output headers garbled on busy server
We run a public OSS server that is quite busy. We are using the standard combined log format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
When I run apachetop -f /var/log/httpd/access_log I get weird garbled output headers:
last hit: 01:51:00 atop runtime: 0 days, 00:14:10 01:51:01
All: 0 reqs 49209.0/sec) 057.9( 0.0B/s 90.7G ( 109.2M 1931.9K
2xx: 41828 (85.0%) 3xx: 7 ( 0.0%) 4xx: 7374 (15.0%) 5xx: 0 ( 0.0%)
R ( 30s): 0 r1768( 0.0/s58.90.0B ( 0.5220.1M ( 174.0M 3023.4K
2xx: 1521 (86.0%) 3xx: 0 ( 0.0%) 4xx: 247 (14.0%) 5xx: 0 ( 0.0%)
REQS REQ/S KB KB/S URL
88 2.93 16.8 0.6*/centos/7.9.2009/updates/x86_64/repodata/repomd.xml
80 2.67 383.8 12.8 /rocky/8.10/AppStream/x86_64/os/repodata/repomd.xml
67 2.23 207.2 6.9 /rocky/8.10/extras/x86_64/os/repodata/repomd.xml
63 2.10 256.2 8.5 /rocky/9.6/BaseOS/x86_64/os/repodata/repomd.xml
63 2.10 283.3 9.4 /rocky/9.6/AppStream/x86_64/os/repodata/repomd.xml
61 2.03 264.8 8.8 /rocky/8.10/BaseOS/x86_64/os/repodata/repomd.xml
52 1.73 10.0 0.3 /centos/7.9.2009/os/x86_64/repodata/repomd.xml
44 1.47 8.4 0.3 /centos/7.9.2009/extras/x86_64/repodata/repomd.xml
The lists of recent requests appears OK but the headers don't show the correct counts. Any guidance on how to troubleshoot this further?
Hmm, I haven't looked at the code for quite some time (I'm the current maintainer, not the original dev), but I do remember that there are hardcoded values (e.g. the ring buffer). I have to check it out. Maybe I get to it next week.
I suspect you have so many transactions that they don't fit into the buffer.
If there is something I can to test or troubleshoot I'm happy to help.
I think there is also debug output in /tmp. I can't recall whether it is written by default or not. I really have to go through the code and familiarize myself with it again. I haven't touched it for quite some time.
I might need this debug file from your machine. I'll let you know how to collect it. Maybe my memory is failing me and there's no debug. I am sorry, but if you don't work with a code base all the time, it's hard to not mix things up with other projects.
Btw, interesting that apachetop is still used. After goaccess was created I thought people would switch over without hesitation.
I wasn't aware of goaccess so I just checked it out. It's great, has lots of cool reports, but it doesn't have the "last 30 seconds" that Apachetop has.
We're monitoring in real-time the 2xx, 3xx, and 5xx codes. Knowing the total percentage is great, but I really need to know the last 30 seconds so I can catch a problem in real time.
I really like goaccess, it's in my mind the next evolution of apachetop.
but it doesn't have the "last 30 seconds"
Hmm, I thought they provided real time data access info. But you can always open a feature request with goaccess. Your use case makes perfect sense and if I were the author, I'd add it. (I'd add 5, 10, 30, 60s - or at least a few, similar to the load avgs in uptime/top/...)
I'll try to look into apachetop's code base next week. In the mean time, can you please tell me a bit more about your load. How many log lines are created by Apache per second? What would be the correct counts for the header section, do you know?
We're monitoring in real-time the 2xx, 3xx, and 5xx codes. Knowing the total percentage is great, but I really need to know the last 30 seconds so I can catch a problem in real time.
TBH, having used a lot of o11y the past decade, I'd seriously consider a different approach. apachetop requires you to monitor interactively and manually. Using something like open telemetry, loki, prometheus, dynatrace, vector, elk (or whatever tickles your fancy) would allow you to automatically monitor your response code, plus react to them accordingly (send notifications, run scripts, spawn a new container/node, ...).
Apachetop and goaccess are probably rather used for ad-hoc monitoring, which is still nice and important at times, but you don't want to pay a person for manually monitoring 24/7. ;-)
We're doing about 15 hits per second. Here is a sample of the logs:
199.255.45.4 - - [08/Jun/2025:04:17:59 -0700] "GET /rocky/9.6/BaseOS/x86_64/os/repodata/888528d8-5feb-4d11-99dd-2c7dcea36852-PRIMARY.xml.gz HTTP/1.1" 200 1642450 "-" "libdnf (Rocky Linux 9.3; generic; Linux.x86_64)"
199.255.45.4 - - [08/Jun/2025:04:17:59 -0700] "GET /rocky/9.6/BaseOS/x86_64/os/repodata/7a4bbdb1-d938-4ac2-a5d3-20fd9cf6d73f-FILELISTS.xml.gz HTTP/1.1" 200 690421 "-" "libdnf (Rocky Linux 9.3; generic; Linux.x86_64)"
20.171.207.232 - - [08/Jun/2025:04:17:59 -0700] "GET /rocky/8.10/Devel/aarch64/os/Packages/l/libsrtp-1.5.4-8.el8.aarch64.rpm HTTP/1.1" 200 286480 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"
34.75.193.86 - - [08/Jun/2025:04:17:59 -0700] "GET /centos/7.9.2009/extras/x86_64/repodata/repomd.xml HTTP/1.1" 404 196 "-" "urlgrabber/3.10 yum/3.4.3"
199.255.45.4 - - [08/Jun/2025:04:17:59 -0700] "GET /rocky/9.6/BaseOS/x86_64/os/repodata/7a4bbdb1-d938-4ac2-a5d3-20fd9cf6d73f-GROUPS.xml HTTP/1.1" 200 289652 "-" "libdnf (Rocky Linux 9.3; generic; Linux.x86_64)"
This should be pretty standard "combined" format.
I've looked into it, but couldn't find the problem. I'm still on it. I have not forgotten. It's just that during these hot days my brain doesn't work properly.
I appreciate you investigating