Strange access log lines
We're using Unit on a server with a couple of WordPress sites. Recently, Unit sometimes gets unresponsive and a restart is necessary. Unfortunately, the log didn't indicate what's wrong, so I decided to activate the access log again and see if it's a specific request that's responsible for the problems. And there I found some strange lines (to me at least :-)):
40.77.167.77 - - [12/Jan/2022:09:47:49 +0000] "GET /osnd-9015zotenzasw.htm HTTP/1.1" 200 65883 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
<Masked IP-address> - - [12/Jan/2022:09:47:50 +0000] "POST //xmlrpc.php HTTP/1.1" 200 415 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4240.193 Safari/537.36"
As the host field is defines as "-", I take it the requests were made against the IP-address of our server, so no host information is available. But what I don't understand is the status 200 that Unit returns in both cases. Or am I missing something? Is there a default location in Unit for requests without host information?
Hello, Unit uses Routes for request dispatching.
The simplest route configuration is dispatching all request for given listener directly to application:
"127.0.0.1:8300": {
"pass": "applications/myapp"
}
If this is the case, all requests passed to myapp without Host header analysis.
You may pass requests to routes and add host field matching if it is required:
{
"listeners": {
"*:8300": {
"pass": "routes"
}
},
"routes": [
{
"match": { "host": "my.host.pattern.com" },
"action": { "pass": "applications/myapp" }
}
]
}
Thanks for the info, but all our routes have a match for a specific host (and all listeners direct to a route) . The question perhaps is, what does Unit do when there's no match in the routes? The status code of 200 implies Unit found the requested resources. But where are they? In the first example, the request is for /osnd-9015zotenzasw.htm. I couldn't find it anywhere on our server.
What does Unit do when there's no match in the routes?
It respond with 404 "Not Found" status.
But where are they?
Consider to send such request manually (for instance with nc or curl) and inspect the response.
@FC-IT
but all our routes have a match for a specific host (and all listeners direct to a route)
Could you provide your configuration?
See below. I was a bit reluctant at first to show everything, but as all sites are public, it shouldn't be a problem :-).
I finally have a log with more details:
2022/01/22 05:02:23 [info] 177528#177595 no available connections, close idle connection 2022/01/22 05:02:23 [alert] 177528#177593 accept4(27) failed (24: Too many open files) 2022/01/22 05:02:23 [alert] 177528#177593 new connections are not accepted within 100ms 2022/01/22 05:02:23 [info] 177528#177593 no available connections, close idle connection 2022/01/22 05:02:25 [alert] 177528#177596 accept4(16) failed (24: Too many open files) 2022/01/22 05:02:25 [alert] 177528#177596 new connections are not accepted within 100ms 2022/01/22 05:02:25 [info] 177528#177596 no available connections, close idle connection 2022/01/22 05:02:25 [alert] 177528#177595 accept4(27) failed (24: Too many open files) 2022/01/22 05:02:25 [alert] 177528#177595 new connections are not accepted within 100ms 2022/01/22 05:02:25 [info] 177528#177595 no available connections, close idle connection
A search on the internet suggests tweaking some parameters like the following:
net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_intvl = 15
I'll also try to tweak the max open files (per user). Hopefully all this will help.
Our access log still shows strange values like:
Again, no host information. Why is this? The replies above do not give (enough) information. mar0x talks about no host header analysis, but I don't understand what that has to do with the missing info. Does "-" as host means that no host information was sent during the request? And was the request therefore made on ip? Our config apparently didn't help, as there was no response unfortunately.
@FC-IT Actually there's no Host header field in the Combined access log format, that is commonly used by web-servers (and Unit as well): https://en.wikipedia.org/wiki/Common_Log_Format
The missing information in your log line isn't the requested Host, but the client IP address (or remote host, how it's sometimes called). It looks like a bug. It seems in some cases Unit doesn't get client address from the kernel, but I have no idea how to reproduce it. We need to investigate further.
oh... it seems you wrote <ip masked> in place of client IP address, but GitHub removed it. Then it means that you have client IP address in your access log line, so there's no bug.
What made you think, that a Host header field should be somewhere in the log, then?
No, you're right. It's my mistake. I saw something about this in an other question, but now realize it hasn't been implemented (yet, probably 1.27).
To be clear: in (for example) IIS, you can have a log for every site separately. In Unit you've got one big log. So how do I know against which site the request was made? I thought the host header would help us out here :-).