sozu
sozu copied to clipboard
Improve metrics and access logs
Hello there :wave:,
After some discussions with @StevenLeRoux, @BlackYoup and @Wonshtrum. We found out that Sōzu need to provides a little more information about request and response. So, I use this issue to make a summary of what we need.
Firstly, I will remember that we have to tackle the issue #641, #892 and #804.
What we already have
Access logs
Currently, when we handle request (TCP, HTTP[s] or WS[s]) will write following access logs.
01H0FXZVQK7RK3W55X7V26M259 TcpTest TcpTest-0-127.0.0.1:4000 127.0.0.1:52378 -> 127.0.0.1:4000 19ms 168μs 79 68522 owner=John, uuid=3f740af1-45fd-45ce-b61f-17bf1a51505f TCP -
01H0FXX3VTR8T3B1K0ZHXQZZNQ MyCluster the-backend-to-my-app 127.0.0.1:50444 -> 127.0.0.1:1026 61ms 4959μs 79 68535 key=value HTTPS-TLS1.3 lolcatho.st:8443 200 GET /
Below, an explanation of the above access logs:
request id | cluster id | backend id | client socket address -> backend socket address | response time | service time | bytes in | bytes out | metadata | protocol | hostname | status | verb | path |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01H0FXX3VTR8T3B1K0ZHXQZZNQ |
MyCluster |
the-backend-to-my-app |
127.0.0.1:50444 -> 127.0.0.1:1026 |
61ms |
4959μs |
79 |
68535 |
key=value |
HTTPS-TLS1.3 |
lolcatho.st:8443 |
200 |
GET |
/ |
request id | cluster id | backend id | client socket address -> backend socket address | response time | service time | bytes in | bytes out | metadata | protocol | websocket context |
---|---|---|---|---|---|---|---|---|---|---|
01H0FXZVQK7RK3W55X7V26M259 |
TcpTest |
TcpTest-0-127.0.0.1:4000 |
127.0.0.1:52378 -> 127.0.0.1:4000 |
19ms |
168μs |
79 |
68522 |
owner=John, uuid=3f740af1-45fd-45ce-b61f-17bf1a51505f |
TCP |
- |
response time
: the total duration of the requestservice time
: the total duration that Sozu took to treat the requestwebsocket context
: the value is-
if the protocol isTCP
You can find the code source that output those logs here:
TCP or WS
https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/pipe.rs#L275-L287 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/pipe.rs#L328-L340
HTTP
https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L614-L628 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L682-L696 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L758-L773
Metrics
Today, we have a bunch of metrics and we do not use all of them. We should investigate. To do so, we have to:
- [ ] make a proper list of metrics (classname+labels+kind+description).
- [ ] Then, establish the cardinality of each one and segregate them in two parts:
- [ ] The one that are Sōzu-related (event loop, worker, etc..)
- [ ] The one that are cluster-related (status code, response time, service time, cluster response time, data transfer, etc..)
- [ ] The cluster-related have to be labeled with the cluster identifier and optionaly the backend identifier.
What is missing
Access logs
To us, it miss some informations in access logs as :
- client rtt: the time in millisecond to establish a connection between the client and Sōzu
- server rtt: the time in millisecond to establish a connection between Sōzu and the backend
- cluster response time: the time in millisecond between the end of the transfer of the request body and the response header betweend Sōzu and the backend
- data transfer time: the total duration in millisecond of transfering data between the client to the backend and vice-versa
- user-agent
Metrics
We should investigate first the item above.
Registry:
- Reference around logs: https://www.haproxy.com/blog/introduction-to-haproxy-logging/
- TCP log format: https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#8.2.2
- HTTP log format: https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#8.2.3
- What is round trip time: https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/