multipass icon indicating copy to clipboard operation
multipass copied to clipboard

Simplified error example of: TCP Previous segment not captured

Open JCzz opened this issue 1 year ago • 13 comments

ENV

multipass version
multipass   1.11.1+mac
multipassd  1.11.1+mac

Mac Apple M1
MacOS 12.5.1

curl hangs half way through a large file(small file no problem) , if the server(multipass vm) is running in multipass on different computers on different vLAN.

Steps

  1. Create multipass vm and mount from the same folder:
multipass launch --name myvm --network=en0
multipass mount . myvm:~/speed

Question: Is it a "Bonded interface", when using --network=en0

  1. Generate a large file: multipass shell myvm
base64 /dev/urandom | head -c 100000 > file.txt
echo "\The end" >> file.txt
  1. run web server server in multipass vm:
multipass shell myvm
cd speed
python3 -m http.server
  1. Curl to multipass mvm - from another computer vLAN
curl <ip>/file.txt
In my case: `curl 192.170.1.231/file`
7AVzmECVWNbcShP+TE+/6AM/KZWc12AzLKvtctv0pqeSW0SNDW3OSM82SkzF+/UExUUBH4dxlIcM
dObLOlLgz9WfoiEtvZ4Hbx/yq85C+WwnFr2Trhu75qmFrg8Ht8t/x+MyDfI0MyuGw91tKqFgHL4F

Now it hangs, what to do?

If you run this python3 -m http.server on the host computer it works, just not from multipass vm.

Wireshark

If I use Wireshark I can see something standing out(Line 2) right about where it hangs:

10287	0.000580	0.000580	64	192.168.1.22	192.170.1.231	TCP	54	53941 → 80 [RST] Seq=1 Win=0 Len=0

4818	0.004338	0.004338	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Previous segment not captured] 80 → 50623 [ACK] Seq=84853 Ack=86 Win=65152 Len=1448 TSval=1019606200 TSecr=2732723127 [TCP segment of a reassembled PDU]

4820	0.000249	0.000249	64	192.168.1.22	192.170.1.231	TCP	78	[TCP Dup ACK 4817#1] 50623 → 80 [ACK] Seq=86 Ack=67477 Win=131072 Len=0 TSval=2732723132 TSecr=1019605978 SLE=84853 SRE=86301

4821	0.000057	0.000057	64	192.168.1.22	192.170.1.231	TCP	78	[TCP Dup ACK 4817#2] 50623 → 80 [ACK] Seq=86 Ack=67477 Win=131072 Len=0 TSval=2732723132 TSecr=1019605978 SLE=84853 SRE=87749

4822	0.002414	0.002414	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Fast Retransmission] 80 → 50623 [ACK] Seq=67477 Ack=86 Win=65152 Len=1448 TSval=1019606202 TSecr=2732723132 [TCP segment of a reassembled PDU]

4823	0.000001	0.000001	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Out-Of-Order] 80 → 50623 [ACK] Seq=68925 Ack=86 Win=65152 Len=1448 TSval=1019606203 TSecr=2732723132 [TCP segment of a reassembled PDU]

4827	0.002141	0.002141	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Out-Of-Order] 80 → 50623 [ACK] Seq=70373 Ack=86 Win=65152 Len=1448 TSval=1019606205 TSecr=2732723135 [TCP segment of a reassembled PDU]

4829	0.000165	0.000165	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Retransmission] 80 → 50623 [PSH, ACK] Seq=71821 Ack=86 Win=65152 Len=1448 TSval=1019606205 TSecr=2732723135

4830	0.000002	0.000002	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Retransmission] 80 → 50623 [ACK] Seq=73269 Ack=86 Win=65152 Len=1448 TSval=1019606205 TSecr=2732723135

4831	0.000000	0.000000	62	192.170.1.231	192.168.1.22	TCP	1514	[TCP Retransmission] 80 → 50623 [ACK] Seq=74717 Ack=86 Win=65152 Len=1448 TSval=1019606205 TSecr=2732723135

iperf - to multipass vm

iperf -c 192.170.1.98
------------------------------------------------------------
Client connecting to 192.170.1.98, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.135 port 53580 connected with 192.170.1.98 port 5001 (icwnd/mss/irtt=14/1448/4000)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.11 sec   348 MBytes   288 Mbits/sec

iperf - to multipass host machine

iperf -c 192.170.1.36
------------------------------------------------------------
Client connecting to 192.170.1.36, TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.135 port 53640 connected with 192.170.1.36 port 5001 (icwnd/mss/irtt=14/1448/3000)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.06 sec   350 MBytes   291 Mbits/sec

I have added this issue to: Hyperkit: https://github.com/moby/hyperkit/issues/338 Qemu: https://gitlab.com/qemu-project/qemu/-/issues/1584 stackoverflow.com: https://stackoverflow.com/questions/73753103/multipass-tcp-previous-segment-not-captured

Thanks

JCzz avatar Apr 13 '23 16:04 JCzz

Hi @JCzz, thanks for putting in some time to debug this! We will take a deeper look into this once we get the chance. Or, if you end up finding the root issue, feel free to add more details or submit a fix.

sharder996 avatar Apr 17 '23 15:04 sharder996

Any new on this one?

JCzz avatar Jul 27 '23 11:07 JCzz

Hi @JCzz!

I'm guessing the important piece to this is a computer on a different vLAN, right? It works fine on the same vLAN.

Unfortunately, I'm not sure if any of us have a this particular network set up at our disposal, ie, in our homes, but I'll ask the rest of the team to be sure.

townsend2010 avatar Jul 27 '23 12:07 townsend2010

Hi @townsend2010 Thanks for replying, and yes it work when on the same vLAN, just not when crossing vLAN. The thing is that it works, when running from the host machine to Multipass, therefore I am thinking it must be in Multipass setup somewhere.

Lately I have tried using the latest version with qemu to see if that would make any difference. Unfortunately it is the same.

Thanks again and let me know if you need me to do any tests for you? I would be happy to help, as I really like the easy of use in Multipass.

Regards Christian

JCzz avatar Jul 27 '23 12:07 JCzz

We will try to reproduce, but since this is using the macOS external interface, I'm thinking it's a bug(?) in Apple's vmnet layer (or somewhere else in the Apple virtualized networking stack).

Unfortunately, that is completely a black box and it would probably be very difficult to get the required evidence to get Apple to do anything about this.

Are you still using macOS 12.5.1? It's possible it may have been "fixed" in a newer release.

townsend2010 avatar Jul 27 '23 12:07 townsend2010

Also,

Question: Is it a "Bonded interface", when using --network=en0

No, it is not a bonded interface. It's presented as a separate interface in the Multipass instance.

townsend2010 avatar Jul 27 '23 12:07 townsend2010

Thanks for some heads-up @townsend2010

I am on MacOS version 13.4.1

JCzz avatar Jul 27 '23 17:07 JCzz

Using Parallels it work, do you know if they are using vmnet?

JCzz avatar Jul 30 '23 07:07 JCzz

Hi @JCzz!

Using Parallels it work, do you know if they are using vmnet?

Parallels is a closed source product and they don't really give specific implementation details, but considering it has been around longer than the vmnet API, I suspect they use some sort of userspace networking. Again, just a gut feeling type of guess :slightly_smiling_face:

townsend2010 avatar Jul 31 '23 11:07 townsend2010

Anyone know if multipass is using socket_vmnet when using qemu driver?

link: https://github.com/lima-vm/socket_vmnet

If so maybe they are able to getting me closer to a solution - thanks in advance.

JCzz avatar Aug 04 '23 14:08 JCzz

Hi @JCzz!

No, Multipass does not use socket_vmnet. We use vmnet that is already in qemu. At any rate, socket_vmnet and our qemu both use vmnet which is think is part of the problem you are observing- some bug in Apple's vmnet stack.

The only advantage that socket_vmnet brings is that one does not have to run the qemu process as root, but one would still have to run the vmnet_socket process as root, so I don't think it's a big advantage.

townsend2010 avatar Aug 04 '23 15:08 townsend2010

thanks @townsend2010

JCzz avatar Aug 04 '23 16:08 JCzz

Some additional info points from the issue I opened that got merged (#3479):

  • The bug only occurs if the socket is explicitly bound (i.e. curl https://canonical.com --interface enp0s2). Unbound sockets never experience this error in my testing.
  • On Ubuntu 20.04 instances, the bug only occurs if the user is root (even if the socket is explicitly bound!). On 22.04 instances, the bug occurs for all users.

These seem to indicate an issue with the VM itself, rather than a vmnet stack issue.

wallpunch avatar Apr 15 '24 15:04 wallpunch