sozu icon indicating copy to clipboard operation
sozu copied to clipboard

Improve metrics and access logs

Open FlorentinDUBOIS opened this issue 1 year ago • 3 comments

Hello there :wave:,

After some discussions with @StevenLeRoux, @BlackYoup and @Wonshtrum. We found out that Sōzu need to provides a little more information about request and response. So, I use this issue to make a summary of what we need.

Firstly, I will remember that we have to tackle the issue #641, #892 and #804.

What we already have

Access logs

Currently, when we handle request (TCP, HTTP[s] or WS[s]) will write following access logs.

01H0FXZVQK7RK3W55X7V26M259 TcpTest TcpTest-0-127.0.0.1:4000	127.0.0.1:52378 -> 127.0.0.1:4000	19ms 168μs 79 68522	owner=John, uuid=3f740af1-45fd-45ce-b61f-17bf1a51505f	TCP -
01H0FXX3VTR8T3B1K0ZHXQZZNQ MyCluster the-backend-to-my-app	127.0.0.1:50444 -> 127.0.0.1:1026	61ms 4959μs 79 68535	key=value	HTTPS-TLS1.3 lolcatho.st:8443 200	GET /

Below, an explanation of the above access logs:

request id cluster id backend id client socket address -> backend socket address response time service time bytes in bytes out metadata protocol hostname status verb path
01H0FXX3VTR8T3B1K0ZHXQZZNQ MyCluster the-backend-to-my-app 127.0.0.1:50444 -> 127.0.0.1:1026 61ms 4959μs 79 68535 key=value HTTPS-TLS1.3 lolcatho.st:8443 200 GET /
request id cluster id backend id client socket address -> backend socket address response time service time bytes in bytes out metadata protocol websocket context
01H0FXZVQK7RK3W55X7V26M259 TcpTest TcpTest-0-127.0.0.1:4000 127.0.0.1:52378 -> 127.0.0.1:4000 19ms 168μs 79 68522 owner=John, uuid=3f740af1-45fd-45ce-b61f-17bf1a51505f TCP -

response time : the total duration of the request service time : the total duration that Sozu took to treat the request websocket context: the value is - if the protocol is TCP

You can find the code source that output those logs here:

TCP or WS

https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/pipe.rs#L275-L287 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/pipe.rs#L328-L340

HTTP

https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L614-L628 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L682-L696 https://github.com/sozu-proxy/sozu/blob/3aea15ccff60bd8ba680b19b602ac9114fb2370e/lib/src/protocol/http/mod.rs#L758-L773

Metrics

Today, we have a bunch of metrics and we do not use all of them. We should investigate. To do so, we have to:

  • [ ] make a proper list of metrics (classname+labels+kind+description).
  • [ ] Then, establish the cardinality of each one and segregate them in two parts:
    • [ ] The one that are Sōzu-related (event loop, worker, etc..)
    • [ ] The one that are cluster-related (status code, response time, service time, cluster response time, data transfer, etc..)
      • [ ] The cluster-related have to be labeled with the cluster identifier and optionaly the backend identifier.

What is missing

Access logs

To us, it miss some informations in access logs as :

  • client rtt: the time in millisecond to establish a connection between the client and Sōzu
  • server rtt: the time in millisecond to establish a connection between Sōzu and the backend
  • cluster response time: the time in millisecond between the end of the transfer of the request body and the response header betweend Sōzu and the backend
  • data transfer time: the total duration in millisecond of transfering data between the client to the backend and vice-versa
  • user-agent

Metrics

We should investigate first the item above.

Registry:

  • Reference around logs: https://www.haproxy.com/blog/introduction-to-haproxy-logging/
  • TCP log format: https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#8.2.2
  • HTTP log format: https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#8.2.3
  • What is round trip time: https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/

FlorentinDUBOIS avatar May 16 '23 14:05 FlorentinDUBOIS