ztunnel
ztunnel copied to clipboard
Broken pipes on large data writes
fortio load -uniform -c 1 -qps 100 -payload-size=40000 -t 2s -abort-on=-1 http://localhost:808
2023-04-04T15:59:14.949606Z ERROR inbound{id=7e8fd964ca3210a27d985b3cd60cd1ed peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.109007ms
2023-04-04T15:59:14.949666Z INFO outbound{id=ef1a7bfdfe264051a818926c3088c42d}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.949959Z INFO outbound{id=7e8fd964ca3210a27d985b3cd60cd1ed}: ztunnel::proxy::outbound: complete dur=13.086084ms
2023-04-04T15:59:14.960976Z INFO inbound{id=ef1a7bfdfe264051a818926c3088c42d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.964237Z INFO outbound{id=07367cd252975b4528e07779300e6653}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.964321Z ERROR inbound{id=ef1a7bfdfe264051a818926c3088c42d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=3.082466ms
2023-04-04T15:59:14.964705Z INFO outbound{id=ef1a7bfdfe264051a818926c3088c42d}: ztunnel::proxy::outbound: complete dur=15.202051ms
2023-04-04T15:59:14.974146Z INFO inbound{id=07367cd252975b4528e07779300e6653 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.976628Z INFO outbound{id=2cc21d318db51b851dcee2cab2b9dc57}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:14.976645Z ERROR inbound{id=07367cd252975b4528e07779300e6653 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.352267ms
2023-04-04T15:59:14.977073Z INFO outbound{id=07367cd252975b4528e07779300e6653}: ztunnel::proxy::outbound: complete dur=13.092794ms
2023-04-04T15:59:14.988100Z INFO inbound{id=2cc21d318db51b851dcee2cab2b9dc57 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:14.990575Z INFO outbound{id=2cc21d318db51b851dcee2cab2b9dc57}: ztunnel::proxy::outbound: complete dur=14.144643ms
2023-04-04T15:59:14.990614Z ERROR inbound{id=2cc21d318db51b851dcee2cab2b9dc57 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.338017ms
2023-04-04T15:59:14.991361Z INFO outbound{id=da05eb9e3bc61a97a48020a7ff76a4ac}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.002705Z INFO inbound{id=da05eb9e3bc61a97a48020a7ff76a4ac peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.005544Z ERROR inbound{id=da05eb9e3bc61a97a48020a7ff76a4ac peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=2.669356ms
2023-04-04T15:59:15.005681Z INFO outbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.005919Z INFO outbound{id=da05eb9e3bc61a97a48020a7ff76a4ac}: ztunnel::proxy::outbound: complete dur=14.936482ms
2023-04-04T15:59:15.016801Z INFO inbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.018461Z ERROR inbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883 peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=1.508339ms
2023-04-04T15:59:15.018995Z WARN outbound{id=ee6ebc8456ae1a636ba2b8f06f0b0883}: ztunnel::proxy::outbound: failed dur=13.539023ms err=io error: failed to write the buffered data
2023-04-04T15:59:15.019402Z INFO outbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d}: ztunnel::proxy::outbound: proxy to 127.0.0.1:8080 using HBONE via 127.0.0.1:15008 type Direct
2023-04-04T15:59:15.027529Z INFO inbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: got CONNECT request to 127.0.0.1:8080
2023-04-04T15:59:15.029223Z ERROR inbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d peer_ip=127.0.0.1 peer_id=spiffe:///ns/default/sa/default}: ztunnel::proxy::inbound: hbone server copy: io error: Broken pipe (os error 32) dur=1.495578ms
2023-04-04T15:59:15.030291Z WARN outbound{id=58cb7ec5d06ac5887d6b34ed3b71a36d}: ztunnel::proxy::outbound: failed dur=11.073287ms err=io error: broken pipe
We keep seeing the data closed
is this because a mismatch between sender and receiver
@howardjohn, Any chance it's something like hitting a limit on the number of open files or file watches? I think each pipe is going to consume 2 file descriptors (one referring to each end of the pipe).
I am seeing this even with a single connection that sends a large amount of data. so it may be some limit but I don't think it would be num files?
https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer could be related to pipe overflow
I'm pretty sure this has to do with the TCP connection. I can get a similar message if I run iperf3. When I strace ztunnel and run iperf3 through it, I see this
sendto(25, ...) = -1 EPIPE (Broken Pipe)
...
sendto(22, ...) = -1 EPIPE (Broken Pipe)
From the tcp(7), we have
EPIPE The other end closed the socket unexpectedly or a read is executed on a shut down socket.
There were no close or shutdown calls on socket 25/22, so it seems like the cause is the former. Also, RST packets are generated. Socket 25 is the downstream connection to the iperf3 server and 22 is an upstream connection on the server side.
However, I can't seem to replicate the issue with fortio (in new or old builds)
Similar error here with flaky perf tests https://prow.istio.io/view/gs/istio-prow/pr-logs/pull/istio_ztunnel/670/test_ztunnel/1695125088690507776
I am seeing the same error. I have installed 1.20 on an EKS cluster. I do not see this on version 1.18.3. I was also having problems with 1.19.x versions.
2023-11-15T15:57:06.771954Z INFO outbound{id=a686cd938db594dff1aabf0c83400037}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:06.772282Z WARN outbound{id=9ad7269982104f13b534382feee2cd3a}: ztunnel::proxy::outbound: failed dur=10.000982079s err=io error: deadline has elapsed
2023-11-15T15:57:07.138060Z ERROR inbound{id=cddc5afb1fbde92ddbb639703a53ee14 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.000681626s
2023-11-15T15:57:07.138299Z ERROR inbound{id=b4a606115b4490a89dcfdf95d4adec1c peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.000920014s
2023-11-15T15:57:07.197383Z INFO outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::outbound: proxying to 10.4.8.36:6379 using node local fast path
2023-11-15T15:57:07.197805Z INFO outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::outbound: complete dur=537.259µs
2023-11-15T15:57:07.249104Z WARN outbound{id=c28df9a6431936f63fb439789baaaae6}: ztunnel::proxy::outbound: failed dur=10.001664136s err=io error: deadline has elapsed
2023-11-15T15:57:07.432422Z INFO outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:07.432779Z INFO outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::outbound: complete dur=454.854µs
2023-11-15T15:57:07.433600Z ERROR outbound{id=0030fce5bdb4f8e65cc2cb8d41e9af8d}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.143658ms
2023-11-15T15:57:07.592575Z INFO outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:07.593005Z INFO outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::outbound: complete dur=529.237µs
2023-11-15T15:57:07.652315Z INFO inbound{id=651db3badabba3ee03e21bc2b0bc2197 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:07.656382Z ERROR inbound{id=651db3badabba3ee03e21bc2b0bc2197 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: write zero dur=3.974498ms
2023-11-15T15:57:08.141010Z INFO inbound{id=548f55ff6ca0fa9c3585ed43ee4f7d80 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:6379
2023-11-15T15:57:08.141548Z INFO inbound{id=1b6a29a8285a26993d8e3aabe3b63aa7 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:08.210774Z INFO outbound{id=58c61ef9162fe3f76734700dcc0a2ad6}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:08.229147Z ERROR inbound{id=68ef6c18eca0affd1348985b18b854fb peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=2.001717433s
2023-11-15T15:57:08.251237Z INFO outbound{id=c6ac9c9bd515f48f3fd72f84055e15b0}: ztunnel::proxy::outbound: Proxying to 172.20.0.1:443 using TCP via 172.20.0.1:443 type Passthrough
2023-11-15T15:57:08.298242Z WARN outbound{id=e02b999a32bbc5ab40089df990aa79e3}: ztunnel::proxy::outbound: failed dur=10.00988518s err=io error: deadline has elapsed
2023-11-15T15:57:08.361197Z INFO outbound{id=01f5e08a7fdf847157aa9c262e69576a}: ztunnel::proxy::outbound: proxy to 10.4.10.124:6379 using HBONE via 10.4.10.124:15008 type Direct
2023-11-15T15:57:08.435542Z INFO outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:08.435830Z INFO outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::outbound: complete dur=385.572µs
2023-11-15T15:57:08.436771Z ERROR outbound{id=746588d5ef1368ba4414735bc09a8b5f}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.134786ms
2023-11-15T15:57:08.649526Z ERROR outbound{id=2f571f3ea350050b16672a50965ef6c5}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.001693962s
2023-11-15T15:57:08.659663Z INFO inbound{id=603e27f6215ee959d33cd2c9128a0587 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:08.663786Z ERROR inbound{id=603e27f6215ee959d33cd2c9128a0587 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: hbone server copy: io error: broken pipe dur=4.029934ms
2023-11-15T15:57:08.946979Z INFO outbound{id=31b78d7d87b730d10d6b2ffd94081768}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947090Z INFO outbound{id=67c4990f50077fd541dd23910ecd53b1}: ztunnel::proxy::outbound: proxy to 10.4.143.21:6379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947174Z INFO outbound{id=fed2a2aefb4c9209532b62af02399255}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:08.947244Z INFO outbound{id=b9acc3a2915c265139f2b8cef820e30e}: ztunnel::proxy::outbound: proxy to 10.4.143.21:26379 using HBONE via 10.4.143.21:15008 type Direct
2023-11-15T15:57:09.087674Z INFO outbound{id=c01c8f15bcbf564a09497386d387f726}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.088325Z INFO outbound{id=e47b709140c52d866fc4c501228f5824}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.089319Z INFO outbound{id=81c93d5831f85c956dae263df11736da}: ztunnel::proxy::outbound: proxy to 10.4.142.49:6379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.089558Z INFO outbound{id=c96bfd363105401ec913597a59d9b4f8}: ztunnel::proxy::outbound: proxy to 10.4.142.49:26379 using HBONE via 10.4.142.49:15008 type Direct
2023-11-15T15:57:09.198323Z ERROR outbound{id=f3df4226a19998ade204419778547e81}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.000720122s
2023-11-15T15:57:09.229774Z INFO inbound{id=c2e823d9a8e2e421c8292fd56eaceda5 peer_ip=10.4.129.226 peer_id=spiffe://cluster.local/ns/argo-cd/sa/argo-cd-redis-ha-haproxy}: ztunnel::proxy::inbound: got CONNECT request to 10.4.8.36:26379
2023-11-15T15:57:09.438892Z INFO outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::outbound: proxying to 10.4.8.36:26379 using node local fast path
2023-11-15T15:57:09.439217Z INFO outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::outbound: complete dur=419.656µs
2023-11-15T15:57:09.440046Z ERROR outbound{id=2aa1872cc7f3a6ff03fca5212431b426}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=1.049655ms
2023-11-15T15:57:09.597345Z ERROR outbound{id=34e7abc36ba53b71fba5a6f62943cd1a}: ztunnel::proxy::inbound: internal server copy: io error: Connection reset by peer (os error 104) dur=2.004515267s
2023-11-15T15:57:09.615364Z INFO xds{id=13}: ztunnel::xds::client: received response type_url="type.googleapis.com/istio.workload.Address" size=1
fixed by https://github.com/istio/ztunnel/pull/1067