envoy icon indicating copy to clipboard operation
envoy copied to clipboard

downstream_rx_datagram_dropped shows big number with big jump when running UDP traffic.

Open nev888 opened this issue 1 year ago • 24 comments

listener.IP:PORT.udp.downstream_rx_datagram_dropped shows big number and jumps up with really big numbers when running UDP traffic. Collecting the stats during few minutes: listener.IP:PORT.udp.downstream_rx_datagram_dropped: 68763983128 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 210235537380 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 210235537380 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 210235537380 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 211333341100 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 212764708258 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 215293879136 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 215973672978 (last one after stopping udp traffic) listener.IP:PORT.udp.downstream_rx_datagram_dropped: 215973672978 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 215973672978 listener.IP:PORT.udp.downstream_rx_datagram_dropped: 215973672978

If the counter displays number of datagrams dropped for a specific listener? These numbers don't look realistic from traffic perspective. There are not that many traffic at all.

We are using envoy as L7 load balancer for sip traffic. On client side (downstream) traffic is received on TCP/UDP, traffic is load balanced to the application Pod (upstream) over GRPC.

stats.txt server_info.txt clsuters.txt

Envoy code is extended with our own for the specific use case we have.

nev888 avatar Jul 10 '24 17:07 nev888

cc @mattklein123 @danzh2010

nezdolik avatar Jul 11 '24 08:07 nezdolik

Please adjust your listen socket's receive buffer size; https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/http/http3#downstream-stats

danzh2010 avatar Jul 11 '24 14:07 danzh2010

Thanks @danzh2010, If kernel’s UDP listen socket’s receive buffer isn’t large enough, Is it possible it causes other issues? We came across this counter value while investigating a memory leak.

nev888 avatar Jul 11 '24 14:07 nev888

Thanks @danzh2010, If kernel’s UDP listen socket’s receive buffer isn’t large enough, Is it possible it causes other issues? We came across this counter value while investigating a memory leak.

It will become bandwidth limitation, but not memory leak.

danzh2010 avatar Jul 11 '24 15:07 danzh2010

Any recommendation how big it should be? currently it's ~9.5mb.

nev888 avatar Jul 11 '24 15:07 nev888

Any recommendation how big it should be? currently it's ~9.5mb.

This depends on your bandwidth, and the Linux kernel doubles the number your supplied via setsockopt.

Also please keep in mind that the stats is accumulative.

danzh2010 avatar Jul 11 '24 15:07 danzh2010

One more question, Do you have any idea why the counter number is not in sync with traffic rate? First measurement is 68763983128, few mins later it became 210235537380. I had running traffic for few hours, traffic rate were ~40 call/sec. I have generated truncated UDP traffic for few mins too but not even close to those numbers I see in the counter.
All the generated traffic could have reached ~2million max, and not all of it were UDP, only half of it.

nev888 avatar Jul 12 '24 07:07 nev888

I don't know the ingress rate of your service. Assuming it's 5min range, you have ~400M packet drops per sec. You can check netstat -p udp to confirm the numbers are consistent with what kernel sees.

danzh2010 avatar Jul 12 '24 15:07 danzh2010

The previous stat were from a Pod which is not running anymore. I have a different pod with these stats: Traffic were generated with a script, the rate was a rough estimate, Packets per second: 38873 The downstream_rx_datagram_dropped jumped to that big number from 0, the script were running for 1.5hour listener.IPv4_PORT.udp.downstream_rx_datagram_dropped: 61610884148229

bash-4.4$ nstat IpInReceives 3021072 IpInDelivers 3021072 IpOutRequests 3020920 TcpActiveOpens 17 TcpPassiveOpens 93 TcpEstabResets 7 TcpInSegs 1023 TcpOutSegs 871 TcpOutRsts 32 UdpInDatagrams 690 UdpInErrors 3019359 UdpOutDatagrams 690 UdpInCsumErrors 3019359 TcpExtTCPHPHits 174 TcpExtTCPPureAcks 203 TcpExtTCPHPAcks 317 TcpExtTCPAbortOnData 10 TcpExtTCPAbortOnClose 7 TcpExtTCPRcvCoalesce 9 TcpExtTCPOrigDataSent 437 TcpExtTCPDelivered 454 IpExtInOctets 779239070 IpExtOutOctets 779149461 IpExtInNoECTPkts 3021072

nev888 avatar Jul 15 '24 07:07 nev888

nstat only shows incremented values since last run, please use 'nstat -a'. And I didn't see the result had dropped packets count. Can you use netstat -p udp?

danzh2010 avatar Jul 15 '24 14:07 danzh2010

Which UDP extension are you using?

danzh2010 avatar Jul 15 '24 14:07 danzh2010

nstat only shows incremented values since last run, please use 'nstat -a'. And I didn't see the result had dropped packets count. Can you use netstat -p udp?

Tue Jul 16 07:39:46 CEST 2024 listener.IPv4:Port.udp.downstream_rx_datagram_dropped: 89237114699 nstat-2024-07-16_07-39.txt

I don't have net netstat, I can use ss though. udp-sockets-2024-07-16_10-59.txt

nev888 avatar Jul 16 '24 11:07 nev888

Which UDP extension are you using?

We don't use any UDP extension.

nev888 avatar Jul 16 '24 12:07 nev888

Which UDP extension are you using?

We don't use any UDP extension.

Are you using UDP Proxy?

danzh2010 avatar Jul 16 '24 14:07 danzh2010

No,

nev888 avatar Jul 16 '24 14:07 nev888

Can you share your UDP listener config?

danzh2010 avatar Jul 16 '24 15:07 danzh2010

Here the config for the listener udp_listener.txt

nev888 avatar Jul 16 '24 15:07 nev888

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 15 '24 16:08 github-actions[bot]

If you are using raw UDP, why do you need this?

       "connection_balance_config": {
        "exact_balance": {}
       },

danzh2010 avatar Aug 15 '24 16:08 danzh2010

This config is intended for TCP listeners. On the controller side both UDP and TCP are configured with the same config that's why we have this here.

nev888 avatar Aug 16 '24 11:08 nev888

UDP listener other than QUIC is connectionless, you probably don't need that.

danzh2010 avatar Aug 16 '24 14:08 danzh2010

Yep, in UDP case we have no use for it. Do you think this might have anything to do with the counters problem?

nev888 avatar Aug 16 '24 14:08 nev888

Not sure. I'm not familiar with raw UDP listener interaction with connection_balance_config. It is the cause, you may see a warning log about packet being dropped in only some of threads (not all) in Envoy log.

danzh2010 avatar Aug 16 '24 14:08 danzh2010

I see a similar problem with a set up to test UDP in envoy. the dropped datagrams are in the tens of billions per second while the in traffic is just a couple hundred thousands, could be a counter issue? any idea if it's from envoy collecting the metrics or deeper down?

shakedm avatar Aug 25 '24 10:08 shakedm

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 24 '24 12:09 github-actions[bot]

.

nev888 avatar Sep 25 '24 12:09 nev888

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 25 '24 20:10 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Nov 02 '24 00:11 github-actions[bot]