unit icon indicating copy to clipboard operation
unit copied to clipboard

Strange access log lines

Open FC-IT opened this issue 3 years ago • 10 comments

We're using Unit on a server with a couple of WordPress sites. Recently, Unit sometimes gets unresponsive and a restart is necessary. Unfortunately, the log didn't indicate what's wrong, so I decided to activate the access log again and see if it's a specific request that's responsible for the problems. And there I found some strange lines (to me at least :-)):

40.77.167.77 - - [12/Jan/2022:09:47:49 +0000] "GET /osnd-9015zotenzasw.htm HTTP/1.1" 200 65883 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

<Masked IP-address> - - [12/Jan/2022:09:47:50 +0000] "POST //xmlrpc.php HTTP/1.1" 200 415 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4240.193 Safari/537.36"

As the host field is defines as "-", I take it the requests were made against the IP-address of our server, so no host information is available. But what I don't understand is the status 200 that Unit returns in both cases. Or am I missing something? Is there a default location in Unit for requests without host information?

FC-IT avatar Jan 12 '22 10:01 FC-IT

Hello, Unit uses Routes for request dispatching.

The simplest route configuration is dispatching all request for given listener directly to application:

    "127.0.0.1:8300": {
        "pass": "applications/myapp"
    }

If this is the case, all requests passed to myapp without Host header analysis.

You may pass requests to routes and add host field matching if it is required:

{
     "listeners": {
         "*:8300": {
             "pass": "routes"
         }
     },

     "routes": [
         {
            "match": { "host": "my.host.pattern.com" },
            "action": { "pass": "applications/myapp" }
         }
     ]
}

mar0x avatar Jan 12 '22 12:01 mar0x

Thanks for the info, but all our routes have a match for a specific host (and all listeners direct to a route) . The question perhaps is, what does Unit do when there's no match in the routes? The status code of 200 implies Unit found the requested resources. But where are they? In the first example, the request is for /osnd-9015zotenzasw.htm. I couldn't find it anywhere on our server.

FC-IT avatar Jan 12 '22 13:01 FC-IT

What does Unit do when there's no match in the routes?

It respond with 404 "Not Found" status.

But where are they?

Consider to send such request manually (for instance with nc or curl) and inspect the response.

mar0x avatar Jan 12 '22 16:01 mar0x

@FC-IT

but all our routes have a match for a specific host (and all listeners direct to a route)

Could you provide your configuration?

VBart avatar Jan 13 '22 02:01 VBart

See below. I was a bit reluctant at first to show everything, but as all sites are public, it shouldn't be a problem :-).

config.txt

FC-IT avatar Jan 13 '22 12:01 FC-IT

I finally have a log with more details:

2022/01/22 05:02:23 [info] 177528#177595 no available connections, close idle connection 2022/01/22 05:02:23 [alert] 177528#177593 accept4(27) failed (24: Too many open files) 2022/01/22 05:02:23 [alert] 177528#177593 new connections are not accepted within 100ms 2022/01/22 05:02:23 [info] 177528#177593 no available connections, close idle connection 2022/01/22 05:02:25 [alert] 177528#177596 accept4(16) failed (24: Too many open files) 2022/01/22 05:02:25 [alert] 177528#177596 new connections are not accepted within 100ms 2022/01/22 05:02:25 [info] 177528#177596 no available connections, close idle connection 2022/01/22 05:02:25 [alert] 177528#177595 accept4(27) failed (24: Too many open files) 2022/01/22 05:02:25 [alert] 177528#177595 new connections are not accepted within 100ms 2022/01/22 05:02:25 [info] 177528#177595 no available connections, close idle connection

A search on the internet suggests tweaking some parameters like the following:

net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_rfc1337 = 1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_intvl = 15

I'll also try to tweak the max open files (per user). Hopefully all this will help.

FC-IT avatar Jan 22 '22 05:01 FC-IT

Our access log still shows strange values like:

- - [30/Jan/2022:08:49:21 +0000] "POST /upload HTTP/1.1" 200 14 "-" "tgr/1.1"

Again, no host information. Why is this? The replies above do not give (enough) information. mar0x talks about no host header analysis, but I don't understand what that has to do with the missing info. Does "-" as host means that no host information was sent during the request? And was the request therefore made on ip? Our config apparently didn't help, as there was no response unfortunately.

FC-IT avatar Jan 30 '22 09:01 FC-IT

@FC-IT Actually there's no Host header field in the Combined access log format, that is commonly used by web-servers (and Unit as well): https://en.wikipedia.org/wiki/Common_Log_Format

The missing information in your log line isn't the requested Host, but the client IP address (or remote host, how it's sometimes called). It looks like a bug. It seems in some cases Unit doesn't get client address from the kernel, but I have no idea how to reproduce it. We need to investigate further.

VBart avatar Jan 31 '22 01:01 VBart

oh... it seems you wrote <ip masked> in place of client IP address, but GitHub removed it. Then it means that you have client IP address in your access log line, so there's no bug.

What made you think, that a Host header field should be somewhere in the log, then?

VBart avatar Jan 31 '22 01:01 VBart

No, you're right. It's my mistake. I saw something about this in an other question, but now realize it hasn't been implemented (yet, probably 1.27).

To be clear: in (for example) IIS, you can have a log for every site separately. In Unit you've got one big log. So how do I know against which site the request was made? I thought the host header would help us out here :-).

FC-IT avatar Jan 31 '22 07:01 FC-IT