MP-SPDZ icon indicating copy to clipboard operation
MP-SPDZ copied to clipboard

terminate called after throwing an instance of 'std::runtime_error' what(): error in network setup: daa4eabb64cf : Receiving error - 1 : Connection reset by peer

Open xinushio opened this issue 1 year ago • 8 comments

I started two Docker containers, container1 and container2, binding ports 8090 and 8091 respectively. When I run the following command in the container2: ./semi2k-party.x -N 2 -IF Player-Data/Input -p 1 -h 192.168.10.11 -pn 8090 dual_sum an exception occurs with "Connection reset by peer," causing the program to terminate.

However, when I run it on the host machine, this exception does not occur. It seems that during the connection retry process of MP-SPDZ, there is a lack of handling for the "Connection reset by peer" exception.

xinushio avatar Jun 27 '24 06:06 xinushio

What is the output on party 0? The output indicates that it might have failed first.

mkskeller avatar Jun 27 '24 07:06 mkskeller

I have not yet started party 0, but based on the running conditions on the host machine, under normal circumstances, party 1 should retry connecting to party 0 multiple times within a minute.

xinushio avatar Jun 27 '24 07:06 xinushio

However, when running in the container, party 1 fails immediately.

xinushio avatar Jun 27 '24 07:06 xinushio

Retrying connections is indeed implemented. However, the error message indicates that the initial connection is accepted but then dropped, so I'm wondering what happens if party 0 is started first as the first connection goes from party 1 to party 0.

mkskeller avatar Jun 27 '24 08:06 mkskeller

If party 0 is started first, the computation proceeds normally.

xinushio avatar Jun 27 '24 08:06 xinushio

I see but I'm not sure what to make of this. My understanding is that party 0 not being present should lead to the connection being rejected rather being accepted just to be dropped. Do you think this is normal behavior?

mkskeller avatar Jun 27 '24 09:06 mkskeller

Based on my testing, under normal circumstances, when party 1 tries to connect to party 0 and party 0 is not started, party 1 receives a "connection refused" exception. MP-SPDZ correctly catches this exception and initiates the next retry. However, when the Docker container is started, a docker-proxy process listens on the port. When MP-SPDZ tries to access this port, it receives a "connection reset by peer" exception. It is possible that MP-SPDZ does not correctly handle this exception, causing the program to fail directly.

xinushio avatar Jun 27 '24 09:06 xinushio

I hope MP-SPDZ can catch exceptions and retry in both scenarios until after 1 minute. Can MP-SPDZ help resolve this issue? Thank you.

xinushio avatar Jun 27 '24 09:06 xinushio

Can you confirm that #1643 fixes this?

mkskeller avatar Apr 23 '25 06:04 mkskeller

I first started party0, then started party1, and was able to obtain the computation result successfully.

Image

However, when I start party1 first, it still shows a connection failure.

Image

xinushio avatar Apr 25 '25 03:04 xinushio

I've observed that this issue only arises when using docker-proxy mode, which suggests that it may not be caused by MP-SPDZ itself.

xinushio avatar Apr 25 '25 03:04 xinushio