Can't repair options: Operation not permitted
Hello, I experienced a one-time error during app restoration:
(03.688045) 1192: inet: Restore: family AF_INET type SOCK_STREAM proto IPPROTO_TCP port 58318 state TCP_CLOSE src_addr 127.0.0.1
(03.688069) 1192: tcp: Restoring TCP connection
(03.688095) 1192: tcp: Restoring TCP connection id 94e ino 1bfa5f
(03.688121) 1192: Debug: Setting 1 queue seq to 1429167666
(03.688122) 1192: Debug: Setting 2 queue seq to 173669824
(03.688125) 1192: Debug: Restoring TCP options
(03.688158) 1192: Debug: Will turn SAK on
(03.688159) 1192: Debug: Will set snd_wscale to 7
(03.688185) 1192: Debug: Will set rcv_wscale to 7
(03.688210) 1192: Debug: Will turn timestamps on
(03.688237) 1192: Debug: Will set mss clamp to 65495
(03.688293) 1192: Error (soccr/soccr.c:567): Can't repair options: Operation not permitted
(03.688331) 1192: Error (criu/files.c:1221): Unable to open fd=111 id=0x94e
I investigated the kernel code and found that EPERM is only returned in two scenarios: either sk->sk_state != TCP_ESTABLISHED or !tp->bytes_sent:
case TCP_REPAIR_OPTIONS:
if (!tp->repair)
err = -EINVAL;
else if (sk->sk_state == TCP_ESTABLISHED && !tp->bytes_sent)
err = tcp_repair_options_est(sk, optval, optlen);
else
err = -EPERM;
break;
Furthermore, I examined CRIU's code for restoring TCP connections, particularly the libsoccr_set_sk_data_noq() function. I noticed that it ignores the EINPROGRESS error returned by the connect() function. Since this failed socket is in O_NONBLOCK mode, I suspect that the connect() function might have returned the EINPROGRESS error, and the TCP connection state hadn't transitioned to TCP_ESTABLISHED when the repair was called below, resulting in a kernel error EPERM.
I'm not sure if the analysis above is correct. If there are any logical errors, please correct me. I can also provide some criu images to assist with the analysis.
Details of the failed socket: (decoded from files.img)
{
"type": "INETSK",
"id": 2382,
"isk": {
"id": 2382,
"ino": 1833567,
"family": "INET",
"type": "STREAM",
"proto": "TCP",
"state": "CLOSE",
"src_port": 58318,
"dst_port": 50207,
"flags": "0x802",
"backlog": 0,
"src_addr": [
"127.0.0.1"
],
"dst_addr": [
"127.0.0.1"
],
"fown": {
"uid": 0,
"euid": 0,
"signum": 0,
"pid_type": 0,
"pid": 0
},
"opts": {
"so_sndbuf": 2626560,
"so_rcvbuf": 131072,
"so_snd_tmo_sec": 0,
"so_snd_tmo_usec": 0,
"so_rcv_tmo_sec": 0,
"so_rcv_tmo_usec": 0,
"reuseaddr": false,
"so_priority": 0,
"so_rcvlowat": 1,
"so_passcred": false,
"so_passsec": false,
"so_dontroute": false,
"so_no_check": false,
"so_reuseport": false,
"so_broadcast": false,
"so_keepalive": false,
"so_oobinline": 0
},
"ip_opts": {
"ttl": 64
},
"shutdown": "READ",
"tcp_opts": {
"nodelay": true,
"keepcnt": 9,
"keepidle": 7200,
"keepintvl": 75
}
}
},
@yummypeng Could you describe your use case in more detail? What application are you checkpointing/restoring, and what is the reason for preserving established TCP connections? For example, can you use --tcp-close and let the application reconnect after the restore?
@rst0git Hi, I'm trying to use CRIU and cuda-checkpoint tools for checkpointing and restoring AI applications. I've tested various AI inference services and found a set of generic CRIU parameters that seem to work well. I've noticed that using the --tcp-established parameter increases the likelihood of successfully dumping different applications. I'm unsure if the failed application has any reconnect logic. However, since CRIU's code doesn't explicitly forbid using --tcp-established for dumping a socket in TCP_CLOSE state, I want to keep this parameter and try to address the issue.
A friendly reminder that this issue had no activity for 30 days.
Hi, @rst0git we found a race condition could reproduce this problem.
Assume that there are two process A & B in container, and they communicate through tcp sockets with ip 127.0.0.1 and port A & B, and these two sockets are in TCP_CLOSE state.
During dump stage, the two tcp sockets are dumped into image. During restore stage, the EPERM error maybe triggered by follow order:
- CRIU try to restore socket A. It create socket A, and then call connect to change its state to TCP_ESTABLISHED.
- CRIU build a FIN packet, and send it to socket A through raw socket.
- socket A received FIN. According to TCP rule, it will reply a ACK to socket B.
- socket B is not created now. So the kernel will reply a RST to socket A.
- socket A received RST. It turns into TCP_CLOSE state.
- CRIU call shutdown to socket A. It will return a ENOTCONN error, but was ignored.
- CRIU try to restore socket B. same as A, It create socket B, and then call connect to change its state to TCP_ESTABLISHED.
- CRIU continue restore socket A. It build a ACK packet to socket A.
- Kernel received that ACK packet, and replies a RST packet because socket A was closed in step 5.
- socket B received RST. It turns into TCP_CLOSE state.
- CRIU try to restore socket B's tcp options, because its state is TCP_CLOSE, the EPERM error is returned.
Here is a diagram:
The key point is:
- socket A and B are in different process, so the restore action is concurrent.
- there are no lock_connection() during restore stage. So the RST/ACK packet generated by kernel won't be droped, it bothers another socket.
So maybe we need to call lock_connection() during restore stage, too.
A friendly reminder that this issue had no activity for 30 days.