netobserv-ebpf-agent icon indicating copy to clipboard operation
netobserv-ebpf-agent copied to clipboard

Invalid src and dst IPs when RTT is enabled for gRPC traffic

Open matijavizintin opened this issue 1 year ago • 1 comments

Here is an example of invalid IPs for gRPC traffic

ipv4: 13:59:40.537606 eth0 IP 128.120.33.77:32888 > 138.28.0.0:8525: dscp: 0x14 protocol:tcp type: 0 code: 0 dir:0 bytes:32 packets:1 flags:16 ends: 13:59:40.537606 dnsId: 0 dnsFlags: 0x0000 dnsLatency(ms): 0 rtt(ns) 9899000 DropPkts: 0 DropBytes: 0 DropCause 0 ipv4: 13:59:37.698857 eth1 IP 214.144.33.77:54928 > 233.230.0.0:8525: dscp: 0x14 protocol:tcp type: 0 code: 0 dir:0 bytes:32 packets:1 flags:16 ends: 13:59:37.698857 dnsId: 0 dnsFlags: 0x0000 dnsLatency(ms): 0 rtt(ns) 2717000 DropPkts: 0 DropBytes: 0 DropCause 0

you can see that the last 2 octets are the same. The IPs should be 10.9.x.x. Here is an example of non-gRPC traffic where IPs are correct

ipv4: 14:08:42.804832 eth1 IP 10.9.69.17:3000 > 10.9.37.63:39946: dscp: 0x0 protocol:tcp type: 0 code: 0 dir:0 bytes:66 packets:1 flags:16 ends: 14:08:42.804832 dnsId: 0 dnsFlags: 0x0000 dnsLatency(ms): 0 rtt(ns) 10000 DropPkts: 0 DropBytes: 0 DropCause 0 ipv4: 14:08:42.804849 eth1 IP 10.9.76.28:3000 > 10.9.37.63:37434: dscp: 0x0 protocol:tcp type: 0 code: 0 dir:0 bytes:66 packets:1 flags:16 ends: 14:08:42.804849 dnsId: 0 dnsFlags: 0x0000 dnsLatency(ms): 0 rtt(ns) 10000 DropPkts: 0 DropBytes: 0 DropCause 0

kernel version: 5.15.0-112-generic tested version: build from main branch and 1.6.1-community

This happens only when ENABLE_RTT=true and only for gRPC traffic. IPs for other traffic look ok also if RTT is disabled IPs for gRPC traffic look ok.

matijavizintin avatar Jul 30 '24 18:07 matijavizintin

Hi @matijavizintin Thanks for opening this issue, I would like to have more details, u said u were testing with gPRC but I don't see port 443 anywhere in ur traces. can tell me what is 138.28.0.0 ip address of and same for the good ips 10.9.x.x if you can share a way to reproduce this I can look at it more I assume u are running the agent locally in ur machince into in k8s cluster right ?

msherif1234 avatar Aug 02 '24 10:08 msherif1234

In this case we have 2 go apps talking to each other via gRPC on port 8525. Each app sits on it's own (well, hundreds of them) physical machine. So no k8s in this case.

In this example 10.9.37.63 is the machine with gRPC client and 10.9.69.17 is the machine with gRPC server. 128.120.33.77 and 138.28.0.0 are invalid ips. They don't exist in our DC and the app doesn't communicate directly with the internet.

Here is what I'm running: on the machine running a go app: METRICS_ENABLE=true CACHE_ACTIVE_TIMEOUT=15s CACHE_MAX_FLOWS=200000 LOG_LEVEL=debug ENABLE_DNS_TRACKING=true ENABLE_RTT=true TARGET_HOST=10.9.26.14 TARGET_PORT=9999 ./netobserv-ebpf-agent-main (this is built from main branch, also tried with version 1.6.1) on another machine collecting the data: ./bin/flowlogs-dump-collector -listen_port=9999 2>&1 | grep 8525

and then send some gRPC traffic to that machine and that should be it. For other "normal" http calls (see example on port 3000) it works well and also this happens only when RTT is enabled. It looks like the IPs get messed up somehow.

matijavizintin avatar Aug 02 '24 11:08 matijavizintin

Hey, While the purpose of #286 is not to fix this issue, I'm wondering if it would change anything, since it's changing how the headers are read for RTT extraction... Will be good to check

jotak avatar Aug 12 '24 07:08 jotak

@matijavizintin can u pls confirm if u still see this issue or its gone and we can close this issue ?

msherif1234 avatar Jun 16 '25 13:06 msherif1234

hey @msherif1234 I didn't have time (well, it was never prioritized) to work on that, we kept the RTT feature off and it's working fine. So lets close it for now and I'll evaluate again when I'll do the version upgrade. Thanks

matijavizintin avatar Jun 17 '25 18:06 matijavizintin