terminate called after throwing an instance of 'std::runtime_error' what(): error in network setup: daa4eabb64cf : Receiving error - 1 : Connection reset by peer
I started two Docker containers, container1 and container2, binding ports 8090 and 8091 respectively. When I run the following command in the container2: ./semi2k-party.x -N 2 -IF Player-Data/Input -p 1 -h 192.168.10.11 -pn 8090 dual_sum an exception occurs with "Connection reset by peer," causing the program to terminate.
However, when I run it on the host machine, this exception does not occur. It seems that during the connection retry process of MP-SPDZ, there is a lack of handling for the "Connection reset by peer" exception.
What is the output on party 0? The output indicates that it might have failed first.
I have not yet started party 0, but based on the running conditions on the host machine, under normal circumstances, party 1 should retry connecting to party 0 multiple times within a minute.
However, when running in the container, party 1 fails immediately.
Retrying connections is indeed implemented. However, the error message indicates that the initial connection is accepted but then dropped, so I'm wondering what happens if party 0 is started first as the first connection goes from party 1 to party 0.
If party 0 is started first, the computation proceeds normally.
I see but I'm not sure what to make of this. My understanding is that party 0 not being present should lead to the connection being rejected rather being accepted just to be dropped. Do you think this is normal behavior?
Based on my testing, under normal circumstances, when party 1 tries to connect to party 0 and party 0 is not started, party 1 receives a "connection refused" exception. MP-SPDZ correctly catches this exception and initiates the next retry. However, when the Docker container is started, a docker-proxy process listens on the port. When MP-SPDZ tries to access this port, it receives a "connection reset by peer" exception. It is possible that MP-SPDZ does not correctly handle this exception, causing the program to fail directly.
I hope MP-SPDZ can catch exceptions and retry in both scenarios until after 1 minute. Can MP-SPDZ help resolve this issue? Thank you.
Can you confirm that #1643 fixes this?
I first started party0, then started party1, and was able to obtain the computation result successfully.
However, when I start party1 first, it still shows a connection failure.
I've observed that this issue only arises when using docker-proxy mode, which suggests that it may not be caused by MP-SPDZ itself.